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PREFACE 


This book is an outgrowth of lectures on the theory of probability 
which the author has given at Stanford University for a number of 
years. At first a short mimeographed t(‘xt covering only the elementary 
parts of the subject was used for the guidance of students. As time 
went on and the scope of the courses was gradually enlarged, the necessity 
arose of putting into the hands of students a more elaborate exposition 
of the most important ])arts of the theory of probability. Accordingly 
a rather large manuscript was prepared for this purpose. The author 
did not plan at first to publish it, but students and other persons who had 
opportunity to peruse the manuscript were so persuasive that publication 
was finally arranged. 

The book is arranged in such a way that the first part of it, consisting 
of Chapters I to XII inclusive, is accessible to a person without advanced 
mathematical knowledge. Chapters VII and VIII are, perhaps, excep¬ 
tions. The analysis in Chapter VH is rather involved and a better way 
to arrive at the same results would be very desirable. At any rate, a 
reader who 'does not have time or inclination to go through all the 
intricacies of this analysis may skip it and retain only the final results, 
found in Section 11. Chapter VIII, though dealing with interesting 
and historically important problems, is not important in itself and may 
without loss be omitted by readers. Chapters XIII to XVI incorporate 
the results of modern investigations. Naturally they are more complex 
and require more mature mathematical preparation. 

Three appendices are added to the book. Of these the second is by 
far the most important. It gives an outline of the famous Tshebysheff- 
Markoff method of moments applied to the proof of the fundamental 
theorem previously established by another method in Chapter XIV. 

No one will dispute Newton's assertion: '‘In scientiis addiscendis 
exempla magis prosunt quam praecepta." But especially is it so in the 
theory of probability. Accordingly, not only are a large number of 
illustrative problems discussed in the text, but at the end of each chapter 
a selection of problems is added for the benefit of students. Some of 
them are mere examples. Others are more difficult problems, or even 
important theorems which did not find a place in the mam text. In all 
such cases sufficiently explicit indications of solution (or proofs) are given. 

V 
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PREFACE 


The book does not go into applications of probability to other sciences. 
To present these applications adequately anotlu^r volume of perhaps 
larger size would be required. 

No one is more aware than the author of the many imperfections in 
the plan of this book and its execution. To i>rescnt an entirely satis¬ 
factory book on probability is, indeed, a difficult task. But (n^en with 
all these imperfections we hope that the book will prov(? useful, esi)ecially 
since it contains much material not to be found in other books on th(‘, 
same subject in the English language. 

J. V. Uspensky. 

Stanford University, 

September ^ 1937r 
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INTRODUCTION TO 
MATHEMATICAL PROBABILITY 

INTRODUCTION 

Quanto enim minus raiionis ter minis comprchendi posse 
videhatuTy quae fortuita sunt atque incertay tanto admira- 
hilior ars censehituTy cui ista quoque subjacent .— 

Chr. Huygens, 

De ratiociniis in ludo aleae. 

1 . It is always difficult to describe with adequate conciseness and 
clarity the object of any particular science; its methods, problems, and 
results are revealed only gradually. But if one must define the scope 
of the theory of probability the answer may be this: The theory of 
probability is a branch of applied mathematics dealing with the effects of 
chance. Here we encounter the word ‘^chance,^^ which is often used in 
everyday language but with rather indefinite meaning. To make clearer 
the idea conveyed by this word, we shall try first to clarify the opposite 
idea expressed in the word necessity.Necessity may be logical or 
physical. The statement “The sum of the angles in a triangle is equal 
to two right angles^^ is a logical necessity, provided we assume the 
axioms of Euclidean geometry; for in denying the conclusion of the; 
admitted premises, we violate the logical law of contradiction. 

The following statements serve to illustrate the idea of physical 
necessity: 

A piece of iron falls down if not supported. 

Water boils if heated to a sufficiently high temperature. 

A die thrown on a board never stands on its edge. 

The logical structure of all these statements is the same; When certain 
conditions which may be termed “causes^^ are fulfilled, a definite effect 
occurs of necessity. But the nature of this kind of necessity is different 
from that of logical necessity. The latter, with our organization of 
mind, appears absolute, while physical necessity is only a result of 
extensive induction. We have never known an instance in which water, 
heated to a high temperature, did not boil; or a piece of iron did not fall 
down; or a die stood on its edge. For that reason we are led to believe 
that in the preceding examples (and in innumerable similar instances) 
the effect follows from its “cause'' of necessity. 

1 
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Instead of the term physical necessity” we may introduce the 
abstract idea of natural law.” Thus, it is a natural law” that the 
piece of iron left without support will fall down. Natural laws derived 
from extensive experiments or observations may be called ^'empirical 
laws” to distinguish them from theoretical laws. In all exact sciences 
which have reached a high degree of development, such as astronomy, 
physics, and chemistry, scientists endeavor to build up an abstract and 
simplified image of the infinitely complex physical world—an image 
which can be described in mathematical terms. With the help of 
hypotheses and some artificial concepts, it becomes possible to deri^^e 
mathematically certain laws which, when applied to the world of reality, 
represent many natural phenomena with an amazing degree of accuracy. 
It is true that in the development of the sciences it sometimes becomes 
necessary to recast the previously accepted image of the physical world, 
but it is remarkable that the fundamental theoretical laws even then 
undergo but slight modification in substance or interpretation. 

The chief (‘ndeavor of the exact sciences is the discovery of natural 
laws, and their formulation is of the greatest importance to the promotion 
of human knowledge in general and to the extension of our powers over 
natural phenomena. 

Are the events caused by natural laws absolutely certain? No, 
but for all practical purposes they may be considered as certain. It is 
possible that one or another of the natural laws may fail, but such 
failure would constitute a real miracle.” However, granted that the 
possibility of miracles is consistent with the nature of scientific knowledge, 
actually this jwssibility may bo disregarded. 

2. If the preceding explanations throw a faint light upon the concept 
of necessity, it now remains to illuminate by comparison some charac¬ 
teristic features inherent in the concept of chance.” To say that chance 
is a denial of necessity is too vague a statement, but examples may help 
us to understand it better. 

If a die is thrown upon a board wc are certain that one of the six faces 
will turn up. But whether a 'particular face will show depends on what 
we call chance and cannot be predicted. Now, in the act of tossing a 
die there are some conditions known to us: first, that it is nearly cubic 
in shape; further, if it is a good die, its material is as nearly as possible 
homogeneous. Besides these known conditions, there are other factors 
influencing the motion of the die which are completely inaccessible to our 
knowledge. First among them are the initial position and the impulse 
imparted by the player\s hand. These depend on an ‘^act of will”—an 
agent which may act without any recognizable motivation—and therefore 
they are outside the domain of rational knowledge. Second, supposing 
the initial conditions known, the complexity of the resulting motion 
defies any possibility of foreseeing the final result. 
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Another example: If equal numbers of white and black balls, which do 
not differ in any respect except in color, are concealed in an urn, and we 
draw one of them blindly, it is certain that its color will be either white 
or black, but whether it will be black or white we cannot predict: that 
depends on chance. In this example we again have a set of known 
conditions: namely, that balls in equal numbers are white and black,and 
that they are not distinguishable except in color. But the final result 
depends on other conditions completely outside our knowledge. First, 
we know nothing about the respective positions of the white and black 
balls; second, the choice of one or the other depends on an act of will. 

It is an observed fact that the numbers of marriages, divorces, births, 
deaths, suicides, etc., per 1,000 of population, in a country with nearly 
settled living conditions and during not too long a period of time, do not 
remain constant, but oscillate within comparatively narrow limits. For 
a given year it is impossible to predict what will be their numbers: that 
depends on chance. For, besides some known conditions, such as the 
level of prosperity, sanital ion, and many other things, there are unnum¬ 
bered factors com[)letely outside our knowledge. 

Many other exami)les of a similar kind can be cited to illustrate the 
notion of chance. They all possess a common logical structure which 
can be described as follows: an event A may materialize under certain 
known or ‘^fixed’^ conditions, but not necessarily; for under the same fixed 
conditions other events B, (7, D, . . . are also possible. The mate¬ 
rialization of A depends also upon other factors completely outside our 
control and knowledge. Consequently, whether A will materialize or 
not under such circumstances cannot be foreseen; the materialization of 
A is due to chance, or, to express it concisely, A is a contingent event. 

3. The idea of necessity is closely related to that of certainty. Thus 
it is certain^’ that everybody will die in the due course of time. In 
the same way the idea of chance is related to that of probability or likeli¬ 
hood, In everyday language, the words ^^probability” and “probable” 
are used with different shades of meaning. By saying, “Probably it will 
rain tomorrow,” we mean that there are more signs indicating rainy 
weather than fair for tomorrow. On the other hand, in the statement, 
“There is little probability in the story he told us,” the word “proba¬ 
bility” is used in the sense of credibility. But henceforth we shall use 
the word as equivalent to the degree of credence which we may place 
in the possibility that some contingent event may materialize. The 
“degree of credence” implies an almost instinctive desire to compare 
probabilities of different events or facts. That such comparison is 
possible one can gather from the following examples: 

I live on the second floor and can reach the ground either by using 
the stairway or by jumping from the window. Either way I might be 
injured, though not necessarily. How do the probabilities of being 
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injured compare in the two cases? Everyone, no doubt, will say that 
the probability of being injured by jumping from the window is ''greater’' 
than the probability of being injured while walking down the stairway. 
Such universal agreement might be due either to personal experience or 
merely to hearsay about similar experiences of other persons. 

An urn contains an equal number of white and black balls that are 
similar in all respects except color. One ball is drawn. It may be either 
black or white. How do the probabilities of these two cases compare? 
One almost instinctively answers: "They arc equal.” 

Now, if there are 10 white balls and 1 black ball in the urn, what 
about the probabilities of drawing a white or a black ball? Again one 
would say without hesitation that the probability of drawing a white ball 
is greater than that of drawing a black ball. 

Thus, probability appears to be something which admits of compari¬ 
sons in magnitude, but so far only in the same way as in the intensity of 
pain produced by piercing the skin with needles. 

But it is a noteworthy observation that men instinctively try to 
characterize probabilities numerically in a naive and unscientific manner. 
We read regularly in the sporting sections of m^wspapers, predictions 
that in a coming race a certain horse has two chances against one to 
win over another horse, or that the chances of two football teams are as 
10 to 7, etc. No doubt experts do know much about the respective 
horses and their riders, or the comparative strengths of two competing 
football teams, but their numerical estimates of chances have no other 
merit than to show the human tendency to assign numerical values to 
probabilities which most likely cannot be expressed in numbers. 

It is possible that a man endowed with good common sense and ripe 
judgment can weigh all available evidence in order to compare the 
probabilities of the various possible outcomes and to direct his actions 
accordingly so as to secure profit for himself or for society. But precise 
conclusions can never be attained unless we find a satisfactory way to 
represent or to measure probabilities by numbers, at least in some cases. 

4 . As in other fields of knowledge, in attempting to measure proba¬ 
bilities by numbers, we encounter difficulties that cannot be avoided 
except by making certain ideal assumptions and agreements. In 
geometry (we speak of applied and not of abstract geometry), before 
explaining how lengths of rectilinear segments can be measured, we must 
first agree on criteria of equality of two segments. Similarly, in dealing 
with probability, the first step is to answer the question: When may two 
contingent events be considered as equally probable or, to use a more 
common expression, equally likely? From the statements of Jacob 
Bernoulli, one of the founders of the mathematical theory of probability, 
one can infer the following criterion of equal probability: 
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Two contingent events are considered as equally probable if, after taking 
into consideration all relevant evidence^ one of them cannot be expected in 
preference to the other. 

Certainly there is some obscurity in this criterion, but it is hardly 
possible to substitute any better one. To be perfectly honest, we must 
admit that there is an unavoidable obscurity in the principles of all the 
sciences in which mathematical analysis is applied to reality. 

The application of Bernoulli\s criterion to particular cases is beset 
with difficulties and requires good common sense and keen judgment. 
There is much truth in Laplace’s statement; ‘T^a th^orie des probability^ 
n’est au fond que le bon sens reduit an calcul.” 

To elucidate the nature of these difficulties, let us consider an urn 
filled with white and black balls, but in \uiknown proportion. The only 
evidence we have, namely, that there are both white and black balls in 
the urn, in this case appears insufficient for any conclusion about the 
respective probabilities of drawing a white or a black ball. We instinc¬ 
tively think of th(‘ numbers of the two kinds of balls, and, being in 
ignorance on this point, we are im^lined to suspend judgment. But if we 
know that whiter and black balls are equal in number and distributed 
without any sort of regularity, this knowledge appears sufficient to 
assume the equality of the probabilities of drawing a white or a black 
ball. It is possible that, perhaps unconsciously, we are influenced by the 
commonly known fact that if we repeatedly draw a ball out of the urn 
many times, returning the ball each time before drawing again, the white 
and the black balls appear in nearly equal numbers. 

If an urn contains a certain number of identical balls distinguished 
from one another by some characteristic signs, for example, by the 
numbers 1, 2, 3, ... , the knowledge that the balls are identical and 
are distributed without regularity suffices in this case to cause us to 
conclude that the probabilities for drawing any of the balls should be 
considered as equal. Again, in so readily assuming this conclusion we 
may be influenced by the fact empirically observed (by ourselves or by 
others) that in a long series of drawings, with balls being restored to 
the urn after each withdrawal, the balls appear with nearly the same 
frequency. 

An ordinary die is tossed. Should we consider the possible numbers 
of points 1, 2, 3, 4, 5, 6 as equally probable? To pronounce any judg¬ 
ment, we must know something about the die. If it is known that the 
die has a regular cubic shape and that its material is homogeneous, we 
readily agree on the equal probabilities of all the numbers of points 
1, 2, 3, 4, 5, 6. And this a priori conclusion, based on Bernoulli’s cri¬ 
terion, agrees with the observed fact that each number of points does 
appear nearly an equal number of times in a long series of throws, if the 
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die is a good one. However, if we only know that the die has a regular 
shape, but not whether or not it is loaded, it is only sensible to suspend 
judgment. 

These examples show that before trying to apply Bernoulli\s criterion, 
we must have at our disposal some evidence the amount of which cannot 
be determined by any gcuieral rules. It may be also that the reason a 
priori must be supph^meiited by some empirical evidence. In some 
cases, lacking sufficieid. grounds to assert equal probabilities for two 
events, we may assume them as a hypothesis, to be kept until for some 
reason we are forced to abandon it. 

6 . Besides the ticklish question: When are we entitled to consider 
events as equally probable? there is another fundamental assumption 
required to make possible the measurement of probabilities by numbers. 

Events Ui, a-i, ... an form an exhaustive set of possibilities under 
certain fixed conditions if at least one of them must necessarily mate¬ 
rialize. They are mutually exclusive if any two of them cannot material¬ 
ize simultaneously. The fundamental assumption referred to consists in 
the possibility of subdividing results consistent with the conditions S 
into a number of exhaustive, mutually exclusive, and equally likely 
events, or cases (as they are commonly called): 

a\y 0 - 2 , » • • a^. 

This being granted, the probability of any one of these cases is assumed 
to be 1/n. 

An event A may materialize in several mutually exclusive particular 
forms: 13, . . . X; that is, if A occurs, then one and only one of the 
events a, . . . \ occurs also, and conversely the occurrence of one of 
these events necessitates the occurrence of A. Thus, if A consists in 
drawing an ace from a deck of cards, A may materialize in four mutually 
exclusive forms: as an ace of hearts, diamonds, clubs, or spades. 

Let an event A be represented by its particular forms ai, a 2 , . . . am, 
which together with other events a^+i, am-f 2 , ... an constitute an 
exhaustive set of mutually exclusive and equally likely cases consistent with 
the conditions >S. Events a],a 2 , . . . a,n are called cases favorable to A. 

Definition of Mathematical Probability. If, consistent with conditions 
S, there are n exhaustive, mutually exclusive, and equally likely cases, and 
m of them are favorable to an event A, then the mathematical probability of 
A is defined as the ratio m/n. 

In drawing a card from a full deck there are 52 and no more mutually 
exclusive and equally likely cases; 4 of them are favorable for drawing an 
ace; hence the probability of drawing an ace is ^2 = Ha- 

From an um containing 10 white, 20 black, and 5 red balls, one ball is 
drawn. Here, distinguishing individual balls, we have 35 equally likely 
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cases. Among them there are 10 , 20 , and 5 cases, favorable resj^ectively 
to a white, a black, or a red ball. Hence the probabilities of drawing a 
white, a black, or a red ball are, respectively, and 

In the first example, instead of 52 cases, we may consider only 13 
eases according to the denominations of the cards. These cases being 
regarded as equally likely, there is only one of them favorable to an 
ace. The probability of drawing an ace is 3. This observation makes 
it clear that the subdivision of all possible results into equally likely 
cases can be done in various ways. To avoid contradictory estimations 
of the same probability we must always observe the following rules: 

Two events are equally likely if each of them can be represented by 
equal numbers of equally likely forms. 

Two events are not equally likely if they are represented by unequal 
numbers of equally likely forms. 

Thus, if two equally likely events are each represented by different 
numbers of their respective forms, then the latter cannot be considered as 
equally likely. 

Each card is characterized by its denomination and the suit to which 
it belongs. Noting denominations, we distinguish 13 cases, but each 
of these is represented by 4 new cases according to the suit to which the 
card belongs. Altogether we have, then, 52 cases recognized as equally 
likely; hence, the above-mentioned 13 cases should be considered as 
equally likely. 

In connection with the definition of mathematical probability, 
mention should be made of an important principle not always explicitly 
stated. If 


... Clmy 5l, 

are all mutually exclusive and equally likely cases consistent with 
certain conditions, and the indication of the occurrence of an event B 
makes cases 61 , . • • bp impossible, cases ai, 02 , . . . still should be 

considered as equally likely. To illustrate this principle, consider an 
urn with six tickets bearing numbers 1, 2, ... 6 . Two tickets are 
drawn in succession. If nothing is known about the number of the first 
ticket, we still have six possibilities for the number of the second ticket, 
which we agree to consider as equally likely. But as soon as the number 
of the first ticket becomes known, then there are only five cases left 
concerning the number of the second ticket. According to the above 
principle we must consider these five cases as equally likely. 

6 . Probability as defined above is represented by a number contained 
between 0 and 1. In 'the extreme case in which the probability is 0, it 
indicates the impossibility of an event. On the contrary, in the other 
extreme case in which the probability is 1 , the event is certain. When 
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the probability is expressed by a number very near to 1, it means that 
the overwhelming majority of cases are favorable to the event. On the 
contrary, a probability near to 0 shows that the proportion of favorable 
cases is small. 

From our experience we know that events with a small probabil¬ 
ity seldom happen. For instance, if the probability of an event is 
1/1,000,000, the situation may be likened to the drawing of a white ball 
from an urn containing 999,999 black balls and a single white one. 
This white btall is practically lost among the majority of black balls, and 
for all practical purposes we may consider its extraction impossible. 
Similarly, the probability 999,999/1,000,000 may be considered, from a 
practical standpoint, as an indication of certainty. What limit for 
smallness of probability is to be set as an indication of practical impos¬ 
sibility? Evidently there is no general answer to this question. Every¬ 
thing depends on the risk we can face if, contrary to expectation, an 
event with a small probability should occur. Hence, the jnain probh'in 
of the theory of probability consists in finding cases in which the proba¬ 
bility is very small or very near to 1. Instead of saying, ‘^The proba¬ 
bility is very near to 1,’^ we shall say, ‘^great probability,’^ although, 
of course, the probability can never exceed 1. 

7 . The definition of mathematical probability in Sec. 5 is essentially 
the classical definition proposed by Jacob Bernoulli and adopted by 
Laplace and almost all the important contributors to the theory of 
probability. But, since the middle of the nineteenth century (Cournot, 
John Stuart Mill, Venn), and especially in our days, the classical de^finition 
has been severely criticized. vSeveral attempts have been made to rear 
up the edifice of the mathematical theory of probability on quite a 
different definition of mathematical probability. It does not enter into 
our plan to criticize these new definitions, but, in the opinion of the 
author, many of them arc self-contradictory. Modern attempts to build 
up the theory of probability as an axiomatic science may be interesting 
in themselves as mental exercises; but from the standpoint of ap})li(‘a- 
tions the purely axiomatic science of probability would have no more 
value than, for example, would the axiomatic theory of elasticity. 

The most serious objection to the classical definition is that it can 
be used only in very simple and comparatively unimportant cases like- 
games of chance. This objection, stressed by von Mises, is in reality 
not a new one. It is one of the objections Leibnitz made against Jacob 
Bernoulli's views concerning the possibility of applications of the theory 
of probability to various important fields of human endeavor and not 
merely to games of chance. 

It is certainly true that the classical definition cannot be directly 
applied in many important cases. But is it the fault of the definition 
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or is it rather due to our ignorance of the innermost mechanisms which, 
apart from chance, contribute to the materialization or nonmaterializa¬ 
tion of contingent events? It seems that this is what Jacob Bernoulli 
meant in his reply to Leibnitz : 

Objiciunt primo, aliam esse rationem calculorum, aliain morborum aut muta- 
tiorium aeris; illorum riumerum dcterniinatum esse, horum iiidetermiiiatuni et 
vagum. Ad quod rcspoudeo, utrumque rcspectu cogriitioiiis nostrae aequi poni 
incertum et indetcrrninatum; sed (iuic(|uani in se ct sua iiatura tale esse, non 
magis a nobis posse conoipi, quairi concipi potest, idem simul ab Auctore naturae 
creatum esse et non creatum: quaecumque enim Deus fecit, eo ii)so dum fecit, 
etiarn determinavit.' 

8 . A brilliant example of how the profound study of a subject finally 
makes it possible to apply the classical definition of mathematical 
probability is afforded in the fundamental laws of genetics (a science of 
(comparatively recent origin, whose importance no one can deny), dis¬ 
covered by the Augustinian monk, Gregor Mendel (1822-1884). During 
eight years Mendel'^ (conducted (ixperimental work in crossing different 
varieties of the common pea plant with the purpose of investigating how 
pairs of contrasting characters were inherited. For the pea jdant there 
are several pairs of such contrasting characters: round or wrinkled seeds, 
tallness or dwarfness, yellow or green pod color, etc. Let us concentrate 
our attention on a definite pair of contrasting charact(crs, yellow or green 
pod color. Peas with green pod color always breed true. Also some 
peas with yellow color always bn^ed true, while still otlucrs produce both 
varieties. True breeding pea plants constitute two pure races: A with 
yellow pod color and B with green pod color, while plants with yellow 
pods not breeding true constitute a hybrid race, C, Crossing plants of 
the race A with those of the race B and planting the seeds, Mendel 
obtained a first generation Fi of hybrids. Letting plants of the first 
generation self-fertilize and again planting their seeds to produce the 
second generation F 2 , Mendel found that in this generation there were 
428 yellow pod plants and 152 green pod plants in the ratio 2.82:1. 
In regard to other contrasting characters the ratio of approximately 3:1 
was observed in all cases. Later experimental work only (*onfirmed 
Mendeks results. Thus, combined experiments of Correns, Tschermak, 
and others gave among 195,477 individuals of F 2 , 146,802 yellow pod 
plants and 48,675 green pod plants, in the ratio 3.016:1. 

^To understand the beginning of this statement see the translation from “Ars 
conjectandi'^ in Chap. VI, p. 105. 

2 Mendel’s results were published in 1865, but passed completely unnoticed until 
in about 1900 the same facts were rediscovered by DeVries, Correns, and Tschermak. 
Modern genetics dates from about this time. 
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Mendel not only discovered such remarkable regularities, but also 
suggested a rational explanation of the observed ratio 3:1, which with 
some modifications is accepted even today. Bodies of plants and 
animals are built up of enormous numbers of cells, among which the 
reproductive cells, or gametes, differ from the remaining somatic^' 
cells in some important qualities. Cells are not homogeneous, but 
possess a definite structure. In somatic cells there are found bodies, 
called chromosomes, whose number is even and the same for the same 
species. Exactly half of this number of chromosomes is found in repro¬ 
ductive cells. Chromosomes are supposed to be seats of hypothetical 
genes,which are considered as bearers of various heritable characters. 
A chromosome of one pure race A bearing a character a differs from the 
homologous chromosome of another pure race B bearing a contrasting 
character b in that they contain genes of different kinds. Since characters 
a and h are borne by definite chromosomes, the situation in regard to the 
two characters a and h is exactly the same as if gametes of both races 
contained just one chromosome. Let us represent them symbolically by 
G and G. In the act of fertilization a pair of paternal and maternal 
gametes conjugate and form a zygote, which by division and growth 
produces all cells of the filial generation. Certain of these cells become 
the germ cells and are set apart for the formation, by a complicated 
process, of gametes, one half of which contain the chromosome of the 
paternal type and the other half that of the maternal type. 

According to this theory, in crossing two individuals belonging to 
races A and J3, zygotes of the first generation Fi will be of the type 
G—and will produce gametes, in equal numbers, of the types G, G. 
Now if two individuals of Fi (hybrids) are crossed (or one individual 
self-fertilized as in the cases of some plants), one paternal gamete con¬ 
jugates with one maternal, and for the resulting zygote there are four 
possibilities: 

G—G G—G G—G G—G 

These possibilities may be considered as equally probable, whence 
the probabilities for an individual of the generation F 2 to belong respec¬ 
tively to the races A^ B,C are 34; K? 3^* Similarly, one easily finds that 
in crossing an individual of the race A with one of the hybrid race C, 
the probabilities of the offspring belonging to A or C are both equal to 34* 
It is easy now to offer a rational explanation of the Mendelian ratio 
3:1. In the case of pea plants, individuals of the race A and hybrids 
are not distinguishable in regard to the color of their pods. Hence the 
probability of the offspring of a hybrid plant having yellow pods is 
while for the offspring to have green pods the probability is 34. 
When the generation F 2 consists of a great many individuals, the theory 
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of probability shows that the ratio of the number of yellow pod plants to 
the number of green pod plants is not likely to differ much from the ratio 
3:1. In crossing plants of the race A with hybrids, the offspring, if 
numerous, will contain plants of race A or C, respectively, in a proportion 
which is not likely to differ much from 1:1. And this conclusion was 
experimentally verified by Mendel himself. 

9 . If in the case of the Mendelian laws the profound study of the 
mechanism of heredity together with hypothetical assumptions of the 
kind used in physics, chemistry, etc., paved the way for a rational 
explanation of observed phenomena on the basis of the theory of proba¬ 
bility, in many other important instances we are still unable to reach the 
same degn^e of scientific understanding. Stability of statistical ratios 
observed in many cases suggests the idea that they should be explained 
on the basis of probability. For instance, it has been observed that 
the ratio of human male and female births is nearly 51:50 for large 
samples, and this is largely independent of climatic conditions, racial 
differences, living conditions in different countries, etc. Although the 
factors determining sox are known, yet some complications not suffi¬ 
ciently cleared up prevent estimation of probabilities of male and female 
births. 

In all instances of the pronounced stability of statistical ratios we 
may believe that some day a way will be found to estimate probabilities 
in such cases. Therefore many applications of the theory of probability 
to important problems of other sciences are based on belief in the existence 
of the probabilities with which we are concerned. In other cases in 
which the theory of probability is used, we may have grave doubts 
as to whether this science is applied legitimately. The fact that many 
applications of probability are based on belief or faith should not dis¬ 
courage us; for it is better to do something, though it may be not quite 
reliable, than nothing. Only we must not be overconfident about the 
conclusions reached under such circumstances. 

After all, is not faith at the bottom of all scientific knowledge? 
Physicists speak of electrons, which never have been seen and are known 
only through their visible manifestations. Electrons are postulated 
just to coordinate into a coherent whole a large variety of observed 
phenomena. Is not this faith? It must be, for according to Paul 
(Hebrews, 11:1), Faith is the substance of things hoped for, the evidence 
of things not seen.” 

10 . In concluding this introduction it remains to give a short account 
of the history of the theory of probability. Although ancient philoso¬ 
phers discussed at length the necessity and contingency of things, it 
seems that mathematical treatment of probability was not known to the 
ancients. Apart from casual remarks of Galileo concerning the correct 
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evaluation of chances in a game of dice, we find the true origin of the 
science of probability in the correspondence between two great men of 
the seventeenth century, Pascal (1623-1662) and Fermat (1601-1665). 
A French nobleman, Chevalier de M4r6, a man of ability and great 
experience in gambling, asked Pascal to explain some seeming contradic¬ 
tions between his theoretical reasoning and the observations gathered 
from gambling. Pascal solved this difficulty and attacked another 
problem proposed to him by de M4r6. On hearing from Pascal about 
these problems, Fermat became interested in them, and in their private 
correspondence these two great men laid the first foundations of the 
science of probability. Bertrand’s statement, ‘^Les grands noms de 
Pascal et de Fermat d^corent le berceau de cetto science” cannot be 
disputed. 

Huygens (1629-1695), a great Dutch scientist, became acquainted 
with the contents of this corresponden(‘e and, spurred on by the new 
ideas, published in 1654 a first book on probability, ‘^De ratioedniis in 
ludo aleae,” in which many interesting and rather difficult problems on 
probabilities in games of chance wore solved. To him we owe the 
concept of ‘^mathematical expectation” so important in the modern 
theory of probability. 

Jacob Bernoulli (1654-1705) meditated on the subject of probability 
for about twenty years and prepared his great book, “Ars conjectandi,” 
which, however, was not published until eight years after his death in 
1713, by his nephew, Nicholas Bernoulli. Bernoulli envisaged the 
subject from the most g(ineral point of view, and clearly foresaw a whole 
field of applications of the theory of probability outside of the narrow 
circle of problems relating to games of chance. To him is due the 
discovery of one of the most important theorems known as “Bernoulli’s 
theorem.” 

The next great successor to Bernoulli is Abraham de Moivre (1667 - 
1754), whose most important work on probability, “The Doctrine of 
Chances,” was first published in 1718 and twice reprinted in 1738 and 
in 1756. De Moivre does not contribute much to the principles, but this 
work is justly renowned for new and powerful methods for the solution 
Df more difficult problems. Many important results, ordinarily attrib¬ 
uted to Laplace and Poisson, can be found in de Moivre’s book. 

Laplace (1749-1827), whose contributions to celestial mechanics 
assured him everlasting fame in the history of astronomy, was very 
much interested in the theory of probability from the very beginning of 
his scientific career. After writing several important memoirs on the 
subject, he finally published, in 1812, his great work “Throne analytique 
des probabilit^s,” accompanied by a no less known popular exposition, 
“Essai philosophique sur les probabilit^s,” destined for the general 
educated public. Laplace’s work, on account of the multitude of new 
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ideas, new analytic methods, and new results, in all fairness should be 
regarded as one of the most outstanding contributions to mathematical 
literature. It exercised a great influence on later writers on probability 
in Europe, whose work chiefly consisted in elucidation and development 
of topics contained in Laplace's book. 

Thus in European countries further development of the theory of 
probability was somewhat retarded. But the subject took on important 
developments in the works of Russian mathematicians: Tshebysheff 
(1821-1894) and his former students, A. Markoff (1856-1922) and A. 
Liapounoff (1858-1918). Castclnuovo in his fine book ^^Calcolo delle 
probabilita" rightly regards the contributions to the theory of probability 
due to Russian mathematicians as the most important since the time of 
Laplace. 

At the present time interest in the theory of probability is revived 
everywhere, but again the most outstanding recent contributions have 
been made in Russia, chiefly by thre^e prominent mathematicians: S. 
Bernstein, A. Khintchine, and A. Kolmogoroff. 

In closing this introduction it seems })roper to quote the closing 
words of the ^^Essai philosophique sur les inobabiiites”: 

On voit par cot Essai, que la thdorie des probahilitds n’ est au fond, qiie Ic bon 
sens r4duit au calcul: elle fait appr^cier avee exactitude, ce que les 6sprits justes 
sentent par une sorte d’instiiict, sans qu’ils piiissent souvent s’en rendre compte. 
Elle lie laisse rien d’arbitrairc dans le choix des opinions et des partis k prendre, 
toutes les fois que Ton peut, a son moyen, determiner le cboix le plus avantageux. 
Par la, elle devient le supplement le jilus heureux, k rignorance et ^ la faiblesse 
de resprit humain. Si Ton considcre les metliodes analytiques auxquelles cette 
theorie a donne naissance, la verite des principes qui lui servent de base, la 
logique fine et delicate qu' exige leur emploi dans la solution des probiemes, les 
etablissements d’utilite publique qui s’appuient sur elle, et Textension qu’elle a 
regue et qu’elle peut regevoir encore, par son application aux questions les plus 
importantes de la Philosophic naturelle et des sciences morales; si bon observe 
ensuite, (pie dans les choses mcmes qui iie peuvent etre soumise au calcul, elle 
donne les apergus les plus shrs qui puissent nous guider dans nos jugements, 
et qu’elle apprend k se garantir des illusions qui souvent nous 6garent; on verra 
quhl ibest point de science plus digue de nos meditations. 
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CHAPTER I 


COMPUTATION OF PROBABILITIES BY DIRECT 
ENUMERATION OF CASES 


1. The probability of an event can be found by direct application 
of the definition when it is possible to make a complete enumeration of 
all equally likely cases, as well as of those favorable to that event. Here 
we shall consider a few problems, beginning with the simplest, to illustrate 
this direct method of evaluating probabilities. 

Problem 1. Two dice are thrown. What is the probability of 
obtaining a total of 7 or 8 points? 

Solution. Suppose we distinguish the dice by the numbers 1 and 2. 
There are 6 possible cases as to the number of points on the first die; 
and each of these cases can be accompanied by any of the 6 possible 
numbers of points on the second die. Hence, we can distinguish alto¬ 
gether 6 X 6 = 36 different ca$es. Provided the dice are ideally regular 
in shape and perfectly homogeneous, we have good reason to consider 
these 36 cases as equally likely, and we shall so consider them. 

Next, let us find out how many cases are favorable to the total of 
7 points. This may happ(‘n only in the following ways: 


First Die 
1 
% 

3 

4 

5 

6 

Likewise, for 8 points: 

First Die 
2 

3 

4 

5 

6 


Second Die 
6 
5 
4 
3 
2 
1 


Second Die 
6 
5 
4 
3 
2 


That is, out of the total number of 36 cases there are 6 cases favorableV 
to 7 points and 5 cases favorable to 8 points; hence, the probability of^ 
obtaining 7 points is and the probability of obtaining 8 points is 
2. Problem 2. A coin is tossed three times in succession. What 
is the probability of obtaining 2 heads? What is the probability of 
obtaining tails at least once? 
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Solution. In the first throw there are two possible cases: heads or 
tails. And if the coin is unbiased (which we assume is true) these two 
cases may be considered as equally likely. In two throws there are 
2X2 = 4 cases; namely, both of the two possible cases in the first toss 
can combine with both of the possible cases in the second. Similarly, 
in three throws the number of cases will be 2 X 2 X 2 = 8. To find 
the number of cases favorable to obtaining 2 heads, we must consider 
that this can happen only ih three ways: 

Heads Heads Tails 
Pleads Tails Heads 
Tails Heads Heads 

The number of favorable cases being 3, the probability of obtaining 
two heads is %. 

To answer the second part of the question, we observe that there is 
only one case when tails does not turn up. Therefore, the number of 
cases favorable to obtaining tails at least once is 8 — 1 = 7, so that 
the required probability is 

3. Problem 3. Two cards are drawn from a deck of well-shuffled 
cards. What is the probability that both the extracted cards are 
aces? 

Solution. Since there are 52 cards in the deck, there are 52 ways 
of extracting the first card. After the first card has been withdrawn, 
the second extracted card may be one of the remaining 51 cards. There¬ 
fore, the total number of ways to draw two cards is 52 X 51. All these 
cases may be considered as equally likely. 

To find the numbc^r of cases favorable to drawing aces, we observe 
that there are 4 aces; therefore, there are 4 ways to get the first ace. 
After it has been extracted, there are 3 ways to get a second ace. Hence, 
the total number of ways to draw 2 aces, is 4 X 3, and the required 
probability is: 

4 X 3 _ 1 _ 1 

52 X 51 13 X 17 221' 

Problem 4. Two cards are drawn from a full pack, the first card 
being returned to the pack before the second is takerl. What is the 
probability that both the extracted cards belong to a specified suit? 

Solution. There are 52 ways of getting the first card. For the 
second drawing, there are also 52 ways, because by returning the first 
extracted card to the pack, the original number w’as restored. Under 
such circumstances, the total number of ways to extract two cards is 
52 X 62. Now, because there are 13 cards in a suit, the number of 
cases favorable to obtaining two cards of a specified suit is 13 X 13. 
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Therefore, the required probability is given by: 

13 X 13 _ 1 X 1 _ 1 
52 X 52 4 X 4 16* 

4. Problem 6. An urn contains 3 white and 5 black balls. One 
ball is drawn. What is th(^ probability that it is bla(‘k? 

Solution. The total number of balls is 8. To distinguish them, we 
may imagine that they are numbered. As to the number on the ball 
drawn, there are 8 possible cases that may reasonably be considered as 
equally likely. Obviously, there are 5 cases favorable to the black (H)lor 
of the ball drawn. Therefore, the required probability is 

By a slight modification of the last problem, we come to tlu^ following 
interesting situation: 

Problem 6. The contents of the urn are the same as in the foregoing 
problem. But this time we suppose that one ball is drawn, and, zY?? color 
unnoted, laid aside. Then another ball is drawn, and we are required to 
find the probability that it is black or white. 

Solution. Suppose again that the balls.are numbered, so that the- 
white balls bear numbers 1, 2, and 3; and the black balls bear numbers 
4, 5, 6, 7, 8. Obviously, there are 8 ways to get the first ball, and what¬ 
ever it is, there remain only 7 ways to get the scicoiid ball, d'he total 
number of equally likely cases is 8 X 7 == 56. 

It is a little more difficult to find the number of cases favorable to 
extracting a white or black ball in the second drawing. Suppose w(^ are 
interested in the white color of the second ball. If the first ball drawn is 
a white one, it may bear one of the numbers 1 to 3. Whatever this 
number is, the second ball, if it is white, can bear only the two remaining 
numbers. Therefore, under the assumption that the first ball is .a white 
one, the number ofdavorable cases is 3X2 = 6.. Again, supposing that* 
the first ball drawn is black, we have 5 possibilities as to its number, and,. 
corresponding to any one.of these possibilities,, there are 3 possibilitil^s 
as to the number of the white ball to be taken in the second drawing, 
so that the number of favorable cases now is 5 X 3 = 15. The number 
of all favorable cases is 6 + 15 = 21. The required probability for 
the white ball is HU = In the same way, we should find 
that the probability for the black ball is %. It is remarkable that 
these two probabilities are the same as if only a single ball had been 
drawn. 

The situation is quite different if we know the color of the first ball. 
Suppose, for instance, that it is white. The total number of equally 
likely cases will then be 3 X 7 = 21; and the number of cases favorable 
to getting another white ball is 3 X 2 = 6, so that the probability in 
this case is 
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This last example shows clearly how much probability depends upon 
a given or known set of conditions. 

V 5 , Problem 7. Thn'e boxes, identical in appearance, each have two 
drawers. The first box contains a gold coin in each drawer; the second 
contains a silver coin in each drawer; but the third contains a gold coin 
in one drawer and a silvcu* coin in the other, (a) A box is chosen at ran¬ 
dom. Wliat is the ])roba])ility that it contains coins of different metals? 
(b) A box is chosen, one of its drawers opemed, and a gold coin found. 
Wliat is th(‘ probaliility that the other drawer contains a silver coin? 

Solution, (a) Sinci? nothing outwardly distinguishes one box from 
the other, we may recognize three equally likely cases, and among them 
is only one c^ase of a box with coins of different metals. Therefore, we 
estimate the required probability as 

(6) As to the second question, one is tempted to reason as follows: 
ddie fact that a gold coin was found in one drawer leaves only two 
possibilities as to the content of the other drawer; namely, that the coin 
in it is eitlu'r gold or silver. Hence, tlu^ probability of a silver coin in 
tl](' s(H‘ond drawer seems to be }2- l^ut this reasoning is fallacious. 
It is true that, when the gold coin is found in one drawer, there are only 
two possibilities left as to the content of the other drawer; but these 
possibilities cannot be considered as equally likely. To see this point 
cli'arly, let us distinguish the drawers of the first box by the numbers 1 
and 2; those of the second box by the numbers 3 and 4; finally, in the 
third box, 5 will distinguish the drawer containing the silver coin, while 
6 will represent the drawer with the gold coin. 

Instead of three equally likely cases: 

box 1, box 2, box 3 

we now have six cases: 

drawers 1, 2; drawers 3, 4; drawers 5, 6, 

which, with reference to the fundamental assumptions, must be con¬ 
sidered as equally lik(4y. If nothing were known about the content® 
of the drawer which has been opened, the number of this drawer might be 
either 1, 2, 3, 4, 5, or 6. But as soon as the gold coin is discovered in it, 
cases 3, 4, and 5 become impossible^ and there remain three equally likely 
assumptions as to the number of the opened drawer: it may be either 1 or 
2 or 6. That leaves three cases, and in only one of them, namely, in 
case 6, will the other drawer co^ain a silver coin. Thus the answer 
to the second quCvStion (b) is 

6 . In the preceding proBrois the enumeration of cases did not 
present any difficulty. We are now going to discuss a few problems in 
which this enumeration is not so obvious but can be greatly simplified 
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by the use of well-known formulas for the number of permutations, 
arrangements, and combinations. 

Let m distinct objects be represented by the letters a, 6, c, . . . 1. 
Using all these objects, we can place them in different orders and form 
‘^permutations.” For instance, if there are only three letters, a, 6, and c, 
all the possible permutations are: abc, ach, hac, hca, cab, cba, — 6 different 
permutations out of 3 letters. In general, the number of permutations 
Pm of m objects is expressed by 

Pm = 1 • 2 • 3 • • • m= ml 

If n objects are taken out of the total number of m objects to form 
groups, attention being paid to the order of objects in each group, then 
these groups are called “arrangements.” For instance, by taking two 
letters out of the four letters a, 6, c, d, we can form the following 12 
arrangements : 


ah ha ca da 
ac he ch dh 
ad hd cd dc 

Denoting by the symbol the number of arrangements of m 
objects taken n at a time, the following formula holds: 

Az == — l)(m — 2) • • ‘ (m — n + 1). 

Again, if we form groups of n objects taken out of the total number of 
m objects, this time paying no attention to the order of objects in the 
group, we form “combinations.” For instance, following are the 
different combinations out of 5 objects taken 3 at a time: 

ahe ahd ahe acd ace 
ade hed hce hde ede 

In general, the number of combinations out of m objects taken n 
at a time, which is usually denoted by the symbol C^, is given by 

rtn — l)(m — 2) • • • (m — n + 1) 

0^ j-r2.3 . . 

It is useful to recall that the same expression may also be exhibited 
as follows: 

/In _ 

n!(m - n)\' 

whence, by substituting m n instead of n, tl e useful formula 

C 'n — 
m 


can be derived. 
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7. After these preliminary remarks, we can turn to the problems in 
which the foregoing formulas will often be used. 

^^/Problem 8. An urn contains a white balls and b black balls. If 
a + P bails are drawn from this urn, find the probability that among 
them there will be exactly a white and P black balls. 

Solution. If we do not distinguish the order in which the balls come 
out of the urn, the total number of ways to get a + S balls out of the 
total number a + h balls is obviously expressed by and this is 
the number of all possible and equally likely cases in this problem. The 
number of ways to draw a white balls out of the total number a of white 
balls in the urn is C“; and similarly represents the number of ways 
of drawing black balls out of the total number b of black balls. Now 
every group of a white balls combines with every possible group of 
black balls to form the total of a white balls and ^ black balls, so that 
the number of ways to form all the groups containing a white balls and 
black balls is • Cf. This is also the number of favorable cases; 
hence, the required probability is 




or, in a more explicit form. 


( 1 ) 


^ ^ _ I 2- - ja + fi) . 

“il_‘ ■ ’ (g - g + 1) - - 1) » • • ~ g + 1) 

(a + 6) (a + 6 — 1) • * • (a + 6 — a -■ jS + 1) 


Problem 9. An urn contains n tickets bearing numbers from 1 to n, 
and m tickets are drawn at a time. What is the probability that i of 
the tickets removed have numbers previously specified? 

Solution. This problem does not essentially differ from the preceding 
one. In fact, i tickets with preassigned numbers can be likened to i 
white balls, while the remaining tickets correspond to the black balls. 
The required probability, therefore, can be obtained from the expression 
(1) by taking a = f, 6 = n — a = f, j? = m — f and, all simplifications 
performed, will be given by 


( 2 ) 


m{m — 1) • ’ * (m — f + 1) 
n{n — 1) • ' • (ri — f + 1) 


The conditions of this problem were realized in the French lottery, 
which was operated by the French royal government for a long time but 
discontinued soon after the Revolution of 1789. Similar lotteries 
continued to exist in other European countries throughout the nineteenth 
century. In the French lottery, tickets bearing numbers from 1 to 90 
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were sold to the people, and at regular intervals drawings for winning 
numbers were held in different French cities. At each drawing, 5 
numbers were drawn. If a holder of tickets won on a single number, 
he received 15 times its cost to him. If he won on two, three, four, or 
five tickets, he could claim respectively 270, 5,500, 75,000, and, finally. 
1,000,000 times their cost to him. 

The numerical values of the probabilities corresponding to these 
different cases are worked out as follows: we must take n — 90, m ~ 5, 
and i = 1, 2, 3, 4, or 5 in the expression (2). The results are 


Single ticket 
Two tickets 
Three tickets 
Four tickets 
Five tickets 


90 
5 • 4 
90 • 89 

90 * 89 • 88 
5 • 4 • 3 ‘ 2 
90 • 89'88 • 87 
5 • 4 • 3 • 2 • 1 


90 • 89 • 88 • 87 • 86 


1_ 

18' 

A. 

801' 

1 

11748* 

1 

511038* 

_1 _ 

43949268* 


8. Problem 10. From an urn containing a white' balls and h black 
ones, a certain number of balls, k, is drawn, and tlu'y an^ laid aside, their 
color unnoted. Then one mon' ball is drawn; and it is reejuired to find 
the probability that it is a white or a Idack ball. 

Solution. Suppose the k balls removed at first and the last ball 
drawn are laid on /:: + 1 different places, so that the last ball occupies 
the position at the extreme right. The numlx^r of ways to form grou[>s. 
of A; + 1 balls out of the total number of a -b 6 balls, attention btdng 
paid to the order, is 

(a -|- 6) (a “f" 5 — 1) * * ’ {q ^ — /c).-^ 

Such is the total number of cases in this problem, and they nay all b(^ 
considered as equally likely. To find the numbe^r of cases favorable to 
a white ball, we observe that the last place should be occupied by one of 
the a white balls. Whatever this white ball is, the preceding h balls 
form one of the possible arrangements out of a + 5 — 1 remaining balls 
taken A: at a time. Hence, it is obvious that the number of cases favorable 
to a white ball is 

a{a + b — 1) • • {a + h — k), 
and therefore the required probability is given by 


a 


a -f 6 
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for a white ball. In a similar way we find the probability h/{a + h) of 
drawing a black ball. These results show that the probability of getting 
white or black balls in this problem is the same as if no halls at all were 
removed at first. Here we have proof that the peculiar circumstances 
^ observed in Prob. 6 ar(^ general. 

9. Problem 11. Two dice are thrown n times in succession. What is 
the,probability of o])taining double six at least once? 

Solution. As tlierf' are 36 (‘ases in every throw and each case of the 
first throw can (*onibiiie with each case of the second throw, and so on, 
the total number of cases in n throws will be 36"'. Instead of trying to 
find the number of favorable (?ases directly, it is (easier to find the number 
of unfavorable cases; that is, the number of cases in whic^h doubhvsixes 
would be (excluded. In one thi-ow there hit 35 such cases, and in n throws 
tfiere will ])e 35''. Now, excluding these ‘’cases, we obtain 36"' — 35"^ 
favorable cases; hence, the requir(‘d probability is 

v = 1 ~ 

If one die were thrown 7i times in succession, the probability to obtain 
points at least once would be 

- 1 - (t)”. 

Now, suppose we want to find the number of throws sufficient to 
assiu’e a probability > 14 ol^taining double six at least once. To this 
end we must solve the inequality 

(.y)" < i 

for n; whence we find 


n > 


log 2 


log 36 — log 35 


24.6 • 


It means that in 25 throws there is more likelihood to obtain double 
six at least once than not to obtain it at all. On the other hand, in 
24 throws, we have l(‘ss chance to succeed than to fail. 

Now, if we dealt with a single die, we should find that in 4 throws 
there are more chances to obtain 6 points at least once than there are 
chances to fail. 

This problem is interesting in a historical respect, for it was the first 
problem on probability solved by Pascal, who, together with his great 
contemporary Fermat, had laid the first foundations of the theory of 
probability. This ])roblem was suggested to Pascal by a certain French 
nobleman, Chevalier de M6r6, a man of great experience in gambling. 
He had observed the advantage of betting for double six in 25 throws 
and for one six (with a single die) in 4 throws. He found it difficult to 
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understand because, he said, there were 36 cases for two dice and 6 cases 
for one die in each throw, and yet it is not true that 25:4 = 36:6. Of 
course, there is no reason for such an arbitrary conclusion, and the cor¬ 
rect solution as given by Pascal not only removed any apparent paradoxes 
in this case, but it led to the same number, 25, observed by gamblers in | 
their daily experience. 

^ 10. Problem 12. A certain number n of identical balls is distributed 
among N compartments. What is the probability that a certain speci¬ 
fied compartment will contain h balls? 

Solution. To find the number of all possible cases in this problem, 
suppose that we distinguish the balls by numbering the^m from 1 to n. 
The ball with the number 1 may fall into any of the N compartments, 
which gives N cases. The ball with the number 2 may also fall into any 
one of the N compartments; so that the number of cases for 2 balls will 
be A * A = Likewise, for 3 balls the number of cases will be 

iV2 • A = A^ 

and for any number n of balls the number of cases will be A^. To find 
the number of favorable cases, first suppose that a group of h specified 
balls falls into a designated compartment. The remaining n — h balls may 
be distributed in any way among A — 1 remaining compartments. But 
the number of ways to distribute n — h balls among A — 1 compart¬ 
ments is (A — and this becomes the number of all favorable ceases 
in which a specified group of h balls occupies the designated compartment. 
Now, it is possible to form CJ such groups; therefore, the total number of 
favorable "bases is given by 

CJ • (A - l)-^ 


and the required probability will be 


Ph 


Ci ' (A - 

Nn 


In case n, A and h are large numbers, the direct application of this 
formula becomes diflScult, and it is advisable to seek an approximate 
expression for pa. To this end we write the preceding expression thus: 


where 
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Now, supposing 1 

^ k ^ h - 

1, we have 

(i - 

\ 

i 

- 1 ^ ^ ^ 2 ^ 

\ V\ 

n / 

n n 

On the other hand, 


*. 


k(h 

1 

HA 

and so 



(b) (l 

1 

1 



The inequalities (a) and (b) give simple lower and upper limits for P. 
For we can write P^ thus; 


S(' - sX- 


h ~ ky 


and then apply \a) or (b), which leads to these inequalities 


(■-^r -o-r- 


Correspondingly, we have 


1 • 2 • 3 


1 • 2 • 3 


Yi - 


X‘-r‘(‘-4r 

.(-r(-r- 


Problem 13. What is the probability of obtaining a given sum s of 
points with n dice? 

Solution. The number of all cases for n dice is evidently 6". The 
number of favorable cases is the same as the total number of solutions of 
the equation 


+ «2 + 


+ = s 


where ai, ^ 2 , * ‘ ' ctn are integers from 1 to 6. This number can be 
determined by means of the following device: Multiplying the polynomial 

( 2 ) X + + x^ 

by itself, the product will consist of terms 
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where ai and a 2 independently assume all integral values from 1 to 6. 
Collecting terms with the same exponent s, the coefficient of will give 
the number of solutions of the equation 

ai + «2 = s, 

aiy aa being subject to the above mentioned limitations. 

Similarly, multiplying the same polynomial (2) three times in itself 
and collecting terms with the same exponent s, the coefficient of x* will 
give the number of solutions of equation (1) for n = 3. In general, the 
number of solutions of equation (1) for any n is the coefficient of in 
the expanded polynomial 

{x X* + x^ + x^)^. 

Now we have identically 

X + + x^ + x^ + + X^ y _ ^ 

and by the binomial theorem 

n 

a;n(l _ a;6)» = ^ (_ 

I 

eo 

(1 - a;)-" = 2) 

Multiplying these series we find the following expression as the 
coefficient of x*\ 

s — n 

where summation extends over integers not exceeding —-— The same 

sum represents the number of favorable cases. Dividing it by 6we 
get the following expression for the probability of s points on n dice: 

s — n 
_ 

z-o 

The preceding problems suffice to illustrate how probability can be 
determined by direcjt enumeration of cases. For the benefit of students, 
a few simple problems without elaborate solutions are added here. 
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Problems for Solution 

1 . What is the probability of obtaining 9 , 10 , 11 points with 3 dice? 

?%16i ^Hi6» ^Ki6- 

2 . What is the probability of obtaining 2 heads and 2 tails when 4 coins are 

thrown? Ans, %. 

3. Two urns contain respectively 3 white, 7 red, 15 black balls, and 10 wliite^ 

6 red, 9 black balls. One ball is taken from each urn. What is the probability that 
they both will be of the same color? Ans. ^^J'q25- 

4. What is the probability that of 6 cards taken from a full pacl^ 3 wilj be black 

and 3 red. — 0.332 approximately.^ 

^5. Ten cards are taken from a full pack. What is the probability of finding 
among them (a) at least one ace; ( 6 ) at least two aces? Ans. ^^^ 95 ; ^^^ 7 ^ 735 . 

6. The face cards are removed from a full pack. Out of the 40 remaining cards, 
4 are drawn. What is the probability that they belong to different suits? 

Ans. 

7. Under the same conditions, what is the probability that the 4 cards belong to 

different suits and different denominations? Ans. 

8 . Five cards are taken from a full pack. Find the probabilities (a) that they are 
of different denominations; ( 6 ) that 2 are of the same denomination and 3 scattered; 
(c) that one pair is of one denomination and another pair of a different denomination, 
and one odd; (d) that 3 are of the same denomination and 2 scattered; (e?) that 2 are 
of one denomination and 3 of another; 0) that 4 are of one denomination and 1 of 
another. 

(a) 211^165; (b) (c) (d) (e) Hi 65; (f) Hier. 

9. What is the probability that 5 tickets taken in succession in the French lottery 

will present an increasing or decreasing sequence of numbers? Ans. 3^o- 

10. What is the probability that among 5 tickets drawn in the French lottery there 

is at least one with a one-digit number? Ans. "^^^^^10983 = 0.417. 

V. 11. Twelve balls are distributed at random among three boxes. What is the 

55 2^^^ 

probability that the first box will contain 3 balls? Ans. = 0.2120. 

12. In Prob, 12 (page 22) what is the most probable number of balls in a specified 
box? Ans. The probability ’ 



is the greatest if the integer h is determined by the conditions 

13. Apply these considerations to the case of n = 200, N — 20. Ans. Since 
= 10 the inequalities on page 23 give 

1010/ iV*"/ iV 

^ 10! V 20/ V 40/ 

10‘»/, 1 V’Y, 1V, 

^‘“^lOIV 20/ \ 20/ 

To find an approximate value of 

(1 - A)““ 
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note that 


To 3 decimals 


(1 - = e 


- 190[20 + 2 - 202 ‘ * '] 


Pio = 0.128. 


14 . Four diffcrtint objo(^t.s, 1, 2, 3, 4, arc distributed at random on four places 
marked 1, 2, 3, 4. What is the probability that none of the objects occupies the place 
corresponding to its iiuinber? Ans. %. 

16 . Two urns contain, respectively, 1 black and 2 white balls, and 2 black and 
1 white ball. One ball is transferred from the first urn into the second, after which a 
ball is drawn from the second urn. What is the probability that it is white? 

Ans. ^ 2 - 

16 . What is the probability of getting 20 points with fi dice? 

Ans. = 0.09047. 

• 17 . An urn contains a white and b black balls. Balls an^ drawn one by one until 
only those of the same color are left. What is the probability that they are white? 


18. In an urn there are n groups of p objects each. Objects In different groups are 
distinguished by some characteristic property. What is the probability that among 
o£i + 052 + * • * + otn objects (0 ^ at ^ p; t = 1, 2, . . . n) taken, there are ai of 
one group, a 2 of another, etc.? Ans. Let X among the numbers ai, a 2 , . . . an be 
equal to a, /x be equal to 6 , . . . cr be equal to 1. The required probability is 

_ n[ _ • eg- 

X!m! • • • <ri + ■ ‘ ‘ 


Problem 8 is a particular case of this. 

19 . There are N tickets numbered 1,2, . . . A" of which n are taken at random and 
arranged in increasing order of their numbers: Xi < X 2 < • • • < a;„. What is the 

probability that Xm = M? Am. —- 

C^ 
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THEOREMS OF TOTAL AND COMPOUND PROBABILITY 

1. As the probleiris become more complex the difficulties in enumerat¬ 
ing cases grow and often the computation of probabilities by direct 
application of definition becomes very involved. In many cases the 
complications can be avoided by use of two theorems which are funda¬ 
mental in the theory of probability. 

Before we can give a clear and exact statement of the first fundamental 
theorem, we must define what is meant by “mutually exclusive’^ or 
“incompatible^^ events. Events are called mutually exclusive or 
incompatible if the occurrence of one of them precludes the occurrence 
of all the others. For instance, the four events concerning the number 
of points on two dice 

First Die Second Die 

1 4 

2 3 

3 2 

4 1 

are mutually exclusive because it is evident that as soon as one of them 
occurs, none of the others can materialize. 

On the contrary, events are'compatible if it is possible for them to 
materialize simultaneously. For instance, the events of 5 points on one 
die and 5 points on the other, are compatible, since in tossing two dice 
it is possible to get 5 points on each. 

To denote the probability of an event A, we shall use the symbol (A), 
To denote the probability of A or B (or both) we shall use the symbol 
(A + B). Dealing with several events A, R, . . . L, the symbol 

(A + R + • • • + L) 

will denoie the probability of the o(‘currence of at least one of them. 
If A, R, . . . L are mutually exclusive events, this symbol represents 
the probability of the occurrence of one of them without specification as 
to which one. 

2 . Now we shall state the first fundamental theorem, called the 
“theorem of total probability^^ or “theorem of addition of probabilities,” 
in the following way: 

Theorem of Total Probability* The probability for one of the mutually 
exclusive events Ai, A 2 , . . . An to materialize^ is the sum of the probabilities 

27 
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of these events. In symbolical notations, it is expressed thus: 

(i4l + i42 + * * • + ^n) = (^l) + (^ 2 ) + * * * + (^n)- 

Proof. Let N be the number of all possible and equally likely cases 
out of which mi cases are favorable to the event i4i, m 2 cases are favorable 
to the event 2 , . . . , and finally, m„ cases are favorable to the event An. 
These cases are all different, since events Ai, A 2 , ... An are incompati¬ 
ble. The number of cases favorable to either A 1 or A 2 , . . . or An is 
therefore 


mi + m 2 + 


+ mn. 


Hence, by definition 


(Ai -f* A 2 + 


+ An) — 


^1 + ^2 + 


N 


+ mn _ mi m2 


+ 


+ 


mn 

N' 


Again, by definition of probability, 

= (^ 2 ); 


f = (^0; 


N 


m 

~N 


r = (^n), 


and so finally 

(Al + A2 + * ‘ * + An) — (Ai) + (A2) + • * • + (An), 
as stated. 

3. It is important to know that the same theorem, stated in a slightly 
different form, is especially useful in applications. An event A can 
occur in several mutually exclusive forms, Ai, A 2 , . . . An, which may 
be considered as that many mutually exclusive events. Whenever A 
occurs, one of these events must occur, and conversely. Consequently, 
the probability of A is the same as the probability of one (unspecified) 
of its mutually exclusive forms. If, for instance, occurrence of 5 points 
on two dice is A, then this event occurs in 4 mutually exclusive forms, as 
tabulated above. 

From the new point of view, the theorem of total probability can be 
stated thus: 

Second Form of Theorem of Total Probability. The probability of 
an event A is the sum of the probabilities of its mutually exclusive forms 
Al, A 2 , . . . An/ or, using symbolsy 

(A) = (Al) + (A2) + * • • + (An). 

Probabilities (Ai), (A2), . . . (An) are partial probabilities of incom¬ 
patible forms of A. Since the probability A is their sum, it may be called 
a total probability of A. Hence the name of the theorem. 
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In the preceding example we saw that 5 points on two dice could be 
obtained in 4 mutually exclusive ways. Now the probability of any one 
of these ways is J 36 ; hence, by the preceding theorem, the probability 
of obtaining 5 points with two dice is 

A "1“ aV + A ~ A ~ 

as it should be. 

If events , An are not only mutually exclusive, but 

exhaustive,which means that one of them must necessarily take place, 
the probability that one of them will happen is a certainty = 1 , so that 
we must have 


(yli) + (^ 2 ) + ‘ * • + (An) = 1. 

An event which is not certain, may or may not happen; this constitutes 
two mutually exclusive cases. It is customary to call nonoccurrence of a 
certain event A as the event opposite^^ to A, and we shall denote it 
by the symbol A. Now A and A constitute two exhaustive and mutually 
exclusive cases. Hence, by the preceding remark 

(A) + (i) = 1. 

That is, if p is the probability of A 


q = I - p 

represents the probability that A will not occur. 

4. If an event A is considered in connection witn another event S, 
the compound event AB consists in simultaneous occurrence of A and B. 
For three events A, 5, C, the compound event ABC consists in simul¬ 
taneous occurrence of A and B and C, and so on for any number of 
component events. We shall denote the probability of a compound 
event AB ... Lhy the symbol 

(AB . . . L). 

An event A can materialize in two mutually exclusive forms, namely, 
as A and B or A and B. Hence, by the theorem of total probability 

(4) = (AB) + (AB). 

Similarly 

(B) = (BA) + (BA), 

or, since the symbol (BA) does not depend upon the order of letters, 

(B) = (AB) + (AB). 

The sum (4) + (B) can be expressed as 

(A) + (B) = (AB) + [(AB) + (AB) + (AB)]. 
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Again, by the theorem of total probabilities, the sum 

{AB) + {AB) + {AB) 

represents the probability {A + B) of the occurrence of at least one of 
the events A or B. The preceding equation leads to the useful formula 

(1) (A + E) = (A) + {B) ~ {AB) 

which obviously is a generalization of the theorem of total probability; 
for {AB) = 0 if A and B are incompatible. Equation (1) can be used to 
derive an important inequality. Since (A + ^ 1, it follows from (1) 

that 

{AB) ^ (A) + {B) ~ 1. 

If B itself is a compound event A 1 A 2 , this inequality leads to 
(AA 1 A 2 ) ^ (A) + (A 1 A 2 ) — 1. 

But 

(A,A 2 ) ^ (Ai) + (A 2 ) - 1, 

and so 

(AA 1 A 2 ) ^ (A) + (Ai) + (A 2 ) ~ 2 

for three component events. Proceeding in the same manner, we can 
establish the following general inequality: 

(AA 1 A 2 • • * An-i) ^ (A) + (Ai) + (A 2 ) + * • * + (An-i) — (n — 1). 

Applying this inequality to events A, Ai, . . . An-i respectively 
opposite to A, Ai, . . . A„~i, we got 

(AAi • • • An-i) ^ (A) + (Ai) + • • • + (An-0 — {n — 1), 

or, since (Ai) = 1 — (At), 

(A) + (Ai) + • • • + (An-i) ^ 1 — (AAi • • • An— 1 ). • 

Now the compound event AAi . . . An-i means that neither A nor 
Ai, . . . nor An-i occurs. The event opposite to this is that at least 
one of the events A, Ai, . . . An-i occurs. Hence, 

1 ~ (AAi • ' • An-l) = (A + Ai + * * ‘ + An-l), 
and we reach the following important inequality: 

(A + Ai + * * * + An-l) ^ (A) + (Ai) + • • • + (An-l). 

6. Equation (1) can be extended to the case of more than two events. 
Let B mean the occurrence of at least one of the events A 1 or A 2 . Then 
by (1) 


(A + Ai + A2) = (A) + (Ai + A2) — (AS). 
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As to (Ai + A 2 ), its expression is given by (1). The compound event 
AB means the occurrence of one at least of the events AAi or AA 2 * 
Hence, applying equation (1) once more, we find 

{AB) = {AAi + AA 2 ) = (AAi) + {AA 2 ) - (AA 1 A 2 ) 

and after due substitutions 

(A + Ai + A2) = (A) + (Ai) + (A2) — (AAi) — (AA2) — (A1A2) + 

+ (AA1A2). 

Proceeding in the same way and using mathematical induction, the 
following general formula can be established: 

(A + Ai + • ' * + An-i) = ~ ^{AiAjAk) — • • • 

ij i,j\k 


where summations refer to all combinations of subscripts taken from 
numbers 0, 1, 2, . . . n — 1, one, two, three, . . . , and n at a time. 

6. Let A and B be two events whose probabilities are (A) and (J5). 
It is understood that the probability (A) is determined without any 
regard to B when nothing is known about the occurrence or nonoccur¬ 
rence of B, When it is known that B occurred, A may have a different 
probability, which we shall denote by the symbol (A, B) and call “con¬ 
ditional probability of A, given that B has actually happened.^' 

Now we can state the second fundamental theorem, called the 
“theorem of compound probability’^ or “theorem of multiplication of 
probabilities,” as follows: 

Theorem of Compound Probability. The prohability of simultaneous 
occurrence of A and B is given by the product of the unconditional probability 
of the event A by the conditional probability of B, supposing that A actually 


occurred. In other words. 


(AB) = iA) • {B, ^). 


p Cf»n 


Proof. Let N denote the total number of equally likely cases among 
which m cases are favorable to the event A. The cases favorable to A 
and B are to be found among the m cases favorable to A. Let their 
number be mi. Then, by the definition of probability, 


(AB) = f - 


which also can be written thus: 


mi 

m 


Now the ratio m/N represents the probability of A. To find the meaning 
of the second factor, we observe that, assuming the occurrence of A, 
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there are only m equally likely cases left (the remaining N — m cases 
becoming impossible) out of which mi are favorable to B. Hence the 
ratio m\jm represents the conditional probability (B, A) oi B supposing 
that A has actually happened. 

Now since 



m\ 

m 


= {B, ^), 


the probability of the compound event AB expressed by the product 

{AB) = {A)-{B, ^). 

Since the compound event AB involves A and B symmetrically, 
we shall have also 


{AB) = {B) • (A, B). 

The theorem of compound probability can easily be extended to several 
events. For example, let us consider three events, A, B, C. The occur¬ 
rence of A and B and C is evidently equivalent to the occurrence of the 
compound event AB and C. We have, therefore, 

{ABC) = (AB)‘ (C7, AB) 

by the theorem of compound probability. By the same theorem 

(AB) = (A)-(B, d), 

so that 

(ABC) = (A) • (B, A) • (C, A7^). 

Obviously this formula can be extended to compound events con¬ 
sisting of more than three components. 

In one particular hut very important case, the expression for the 
compound probability can be simplified; namely, in the case of so-called 
‘independent events.Several events are “independent^^ by definition 
if the probability of any one of them is not affected by supplementary 
knowledge concerning the materialization of any number of the remaining 
events. For instance, if A and B represent white balls drawn from 
two different urns, the probability of A is the same whether the color 
of the ball drawn from the other urn is known or not. Similarly, granted 
that a coin is unbiased, heads at the first throw and heads at the second 
throw are independent events. In such theoretical cases the inde¬ 
pendence of events can be reasonably assumed or agreed upon. In other 
cases, and especially in practical applications, it is not easy to decide 
whether events should be considered as independent or not. 

If A and B are independent, the conditional probability (B, A) is 
the same as the probability (B) found without any reference to A; this 
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follows from the definition of independence. Hence, the expression of 
compound probability {AB) for two independent events becomes 

{AB) = (A) • {B) 

so that the probability of a compound event with independent com¬ 
ponents is simply equal to the product of the probabilities of component 
events. This rule extends to any number of component events if they 
are independent. Let us consider three indepcuident events, A, and C. 
The independence of these events implies 

{B,A) = (B); {C,AB) = (C) 

and hence 

(ABC) ^ (A)‘{B)‘ (C) 

in accordance with the rule. ^ 

To illustrate the theorem of compound probability, let us consider 
two simple examples. An urn contains 2 white balls and 3 black ones. 
Two balls arc drawn, and it is required to find the probability that they 
are both wfiite. Let A be the event consisting in the white color of the 
first ball, and B the event consisting in the white color of the second balL 
The probability {A) of extracting a white ball in the first place is 

(^) - I 

To find the conditional probability (/?, A) we observe, after drawing one 
white ball, that 1 white and 3 black balls remain in the urn. The 
probability of drawing a white ball under such circumstances is 

{B.A) = i. 

Now, by the theorem of (compound probability, we shall have 

{AB) 

Evidently, in this example we dealt with dependent events. 

As an example of independent events, let a coin be tossed any given 
number of times; say, n times. What is the probability of having only 
heads? The compound event in this example consists of n independent 
components; namely, heads at every trial. Now the probability of 
heads in any trial is }/ 2 , and so the required probability will be 1/2”, 
Note: Two events A and B are independent by definition, if 
{A,B)=={A) and {B, A) = {B). 

However, one of these conditions follows from the other. Suppose the condition 

(A, B) = (A) 

is fulfilled, so that A is independent of B. We have then 

{AB) = {B) • (A). 
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On the other hand, 
whence 


{AB) = (A) • (B, A), 
(B, A) = (B), 


so that B is independent of A. 

Three events A, B, C are independent if the following four conditions are fulfilled: 

(A, B) = (A); (A, C) = (A); (B, C) = (B); (C, AB) = (C). 

From the first three conditions it follows that 

(B, A) = (B); (C, A) = (C); (C, B) = (C). 

To show that the other requirements 

(B, AC) = (B); (A, BC) = (A) 

are also fulfilled, we notice that 

(ABC) = (A) . (B, A) . (C, AB) = (A) • (B) • (C) 

because (C, AB) — (C) by hypothesis and (B, A) = (B) as proved. On the other 
hand, 

(ABC) = (A) . (C, A) . (B, AC) 

and (C, A) == (C). Hence, comparing with the preceding expression, 

(B, AC) = (B). 

Similarly, it can be shown that 

(A, BC) = (A). 

The independence of four events A, B, C, /) is assured if the following 11 conditions 
are fulfilled: 


(A, B) = (A, C) = (A, D) = (A); (B, C) - (B, D) - (B); (C, D) = (C); 

(C, AB) = (C); (D, AB) = (B, AC) - (B, BC) = (B); (B, ABC) = (B). 

And in general, independence of n events is assured if 2” — n — 1 conditions of 
similar type are fulfilled. 

If several events are independent, every two of them are independent; but this 
does not suffice for the independence of all events, as can be shown by a simple exam¬ 
ple. An urn contains four tickets with numbers 112, 121, 211, 222, and one ticket is 
drawn. What are the probabilities that the first, second, or third digits in its number 
are 1 ? Let a unit such as the first, second, or third digit, be represented, respectively 
by A, B, or C. Then 


(A) - (B) = (C) = I = i 


Compound probabilities (AB), (AC), (BC) are 

(AB) = (AC) - (BC) - i, 

since among four tickets there is only one whose number has first and second, or 
first and third, or second and third digits of 1. Now, for instance, 


(AB) « i i . i * (A) . (B), 
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whence A and B are independent. Similarly, A and C; C and B are independent. 
Thus, any two of the events Ay B, C are independent, but not all three events are. 
For, if they were, we should have 

{ABC) = i. 

But (ABC) = 0 since in no ticket are all three digits equal to 1. 

7 . The theorems of total and compound probability form the founda¬ 
tion of the theory of probability as it represents a separate branch of 
mathematical science. They serve the purpose of finding probabilities 
in more complicated cases, either by l>eing directly applied or by enabling 
us to form equations from which the required probabilities can be found. 
A few selected problems will illustrate the various ways of using these 
theorems. 

. / Problem 14. An urn contains a white balls and h black balls; another 
contains c white and d black balls. One ball is transferred from the first 
urn into the second, and then a ball is drawn from the latter. What is 
the probability that it will be a white ball? 

Solution. The event consisting in the white color of the ball drawn 
from the second urn, can materialize under two mutually exclusive forms: 
when the transferred ball is a white one, and when it is black. By the 
theorem of total probability, we must find the probabilities corresponding 
to these two forms. To find the probability of the first form, we observe 
that it represents a compound event consisting in the white color of the 
transferred ball, combined with the white color of the extracted ball. 
The probability that the transferred ball is white is given by the fraction 

a 

(i b 

and the probability that the ball removed from the second urn is white, is 

c + 1 

c + d + 1 

because before the drawing there were c + 1 white balls and d black 
balls in the second urn. Hence, by the theorem of compound probability, 
the probability of the first form is 

a(c + 1) 

(a + 6)(c + d +T)* 

In the same way, we find that the probability of the second form is 

be 

(a + b){c + d + 1)' 
and the sum of these two numbers 

ac + be + a 

(a + b)(c + d + 1) 
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gives the probability of extracting a white ball from the second urn, after 
one ball of unknown color has been transferred from the first urn. 

8. Problem 16. Two players agree to play under the following 
conditions: Taking turns, they draw the balls out of an urn containing 
a white balls and h black balls, one ball at a time. He who extracts the 
first white one wins the game. What is the probability that the player 
who starts will win the game? 

Solution. Let A be the player who draws the first ball, and let B 
be the other player. The game can be won by A, first, if he extracts a 
white ball at the start; second, if A and B alternately cxtra(^t 2 black 
balls and then A draws a white one; third, if A and B alternately extract 
4 black balls and the fifth ball drawn by A is white; and so on. By the 
theorem of total probability, the probability for A to win the game, 
is the sum of the probabilities of the mutually exclusive ways (described 
above) in which he can win the game. The probability of extracting a 
white ball at first is 


a 

-j- 6 

The probability of extracting 2 black balls and then 1 white ball is found 
by direct application of the theonnn of compound probabilities. Its 
expression is 

h{h — l)a 

(a “1- 6)(u b — l)(cf + & — 2) 

The probability of extracting 4 black balls and ihvn 1 white ball is given 

by 

__ h{b 2){b_- 3)a_ _ 

(a + b){a + b — l)(a + b — 2){a + b — 3)(a + 6 — 4) 

using the same theorem of compound probability. 

In the same way we deal with all the possible and mutually exclusive 
ways which would allow A to win the game. Then, by adding the above 
given expressions of partial probabilities, we obtain the expression for the 
required probability in the form of tlu^ sum 


P = 


a I 

a + b\ 


1 + 


+ 


+ 


_. 

(a + 6 — i) (a + 6 — 2) 

6(6 - 1)(6 - 2)(6 - 3) 


(a +'6 - l)(a + 6 - 2)(a + 6 - 3)(a + 6-4) 


+ 


I' 


The law of formation of different terms in this sum is obvious; and 
the sum automatically ends as soon as we arrive at a term which is equal 

to zero. 
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In the same way, we can find that the prohal)ility for the player B 
to win is expressed by an analogous sum: 


Q 


(X h \ 


b 

a -j- h 


+ 


h(h - l)(b - 2) 


1 {a + b- l)(a + 6 - 2)(a + b - 3) 


+ 


But one of the players, A or must win the game, and the winning of 
the game by A and B are opposite events. Hence, 


P + Q = 1 


or, after substituting the above expressions for P and Q and after obvious 
simplifications, 

b 6(6-1) _ a + 6 

a + 6 - 1 (a + 6 - l)(a -f 6 - 2) ~ a ’ 

This is a noteworthy identity, obtained, as we see, by the principles 
of the theory of probability. Of course, it can be proved in a direct 
way, and it would be a good problem for students to attempt a direct 
proof. There are many cases in which, by means of considerations 
belonging to the theory of probability, several identities or inequalities 
can be established whose direct proof sometimes involves considerable 
difficulty. 

9. Problem 16. Each of k urns contains n identical balls numbered 
from 1 to 71. One ball is drawn from every urn. What is the j)robability 
that m is the greatest number drawn? 

Solution. Let us denote by P,n the required probability. It is not 
apparent how we can find the explicit expression for this probability, but 
using the theorems of total and compound probability, we can form 
equations which yield the dt\sired ex])ression for Pm without any difficulty. 
To this end, let us first find the probability P that the greatest number 
drawn does not exceed in. It is o})vious that this may happen in m 
mutually exclusive ways; namely, when the greatest number drawn is 
1, 2, 3, and so on up to ni. The probabilities of these different hypotheses 
being Pi, P 2 , . . . Pm, their sum gives the following first expression for 
P: 

(1) P = Pi + P 2 + • • • + Pm. 

We can find the second expression for P using the theorem of com¬ 
pound probability; namely, the greatest number drawn does not exceed 
m if balls drawn from all urns have numbers from 1 to m. The proba¬ 
bility of drawing a ball with the number 1, 2, 3, ... m from any urn is 
m/n. And the probability that this will happen for every urn is a 
compound event consisting of fc indeperdent events with the same 
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probability m/n. Therefore, by the theorem of compound probability 



And this compared with (1) gives the equation 
( 2 ) Pi + P2 + • • • + = ^- 

Substituting m — 1 for m in this equation, we get 
Pi + P2 + • • • + P„-i = 

nr 

and it suffices to subtract this from (2) to have the required expression for 

Pm: 

v> _ — {'rn — 1)^ 

m Z * 

nk 

10. Problem 17. Two persons, A and /?, have respectively n + 1 
and n coins, which they toss simultaneously. What is the probability 
that A will have more heads than B? 

Solution. Let /x, ix' and v, v' be numbers of heads and tails thrown 
by A and 5, respectively, so that ^ y — n + I, Ai' + j^' = n. The 
required probability P is the probability of the inequality jx > fi'. The 
probability 1 — P of the opposite event fx ^ is at the same time 
the probability of the inequality v > v'] that is, 1 — P is the probability 
that A will throw more tails than B, By reason of symmetry 1 — P = P, 

P = M. 

11. ProWem 18. Three players A , P, and C agree to play a series of 
games observing the following rules: two players participate in each game, 
while the third is idle, and the game is to be won by one of them. The 
loser in each game quits, and his place in the next game is taken by the 
player who was idle. The player who succeeds in winning over both 
of his opponents without interruption, wins the whole series of games. 
Supposing that the probability for each player to win a single game is 
}4, and that the first game is played by A and P, find the probability for 

P, and C, respectively, to win the whole series, if (a) the number of 
games to be played is limited and may not exceed a given number n; 
if (6) the number of games is unlimited. 

Solution. Let Pn, Pn be the probabilities for A, P, and C, respec¬ 
tively, to win a series of games when their number cannot exceed n. By 
reason of symmetry, Pn = Qn so that it remains to find Pn and Rn* 
The player A can win the whole series of games in two mutually exclusive 
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ways: if he wins the first game, or if he loses the first game. Let the 
probability of the first case be pn and that of the second r„. Then 

Pn ^ Vn + rn. 

A can win the whole series after winning the first game, in two mutually 
exclusive ways: (a.) if he wins over B and C in succession; (6) if he wins 
the first game from B and loses the second game to C] then, if in tlie third 
game C loses to B^ and in the fourth game ^4 wins over B and later wins 
the whole series of not more than n — 3 games. Now, the probability 
of case (a) is V 2 * }'2 = by the theorem of compound probability; 
that of case {h) by the same theorem is and the total probability is 

(1) Pn = i + iPn-S. 

If A loses the first game to J5, but wins the whole series, th<m in the 

second game C wins over B while the third game is won by A, and not 

more than n — 2 games are left to play. Hence, 

(2) r„ = 

Since evidently P 2 — Pz = Pa = equation (1) by successive 
substitutions yields 

P3. = 1(1 + g i + • • • + gL) 

P3*+l "" ^ ^ + ■ ■ ■ + 

_ 1 /. , 1 ,1 ... ,1 

Pzk-\-2 4y ' 8 ' 8^ '8*^ 

or, in condensed form for an arbitrary n 

Pn “ 1(1 “ 8 L -•), 

denoting by [x] the greatest integer contained in x. Hence, by virtue of 
(2) the general expression of Vn will be 

r„ = iV(l - 

and that of Pm Qnj 

_r!L±Il fn-l 

P.^Qn- -h - l\8 L j - ,1,8 L 3 

Finally, to find the probability for C to win, we observe that this can 
happen only if C wins the second game; hence, 

j?, = p„_x = A - 
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Since Pn + Qn -f- Pn < 1, the difference 


Rn = A8 


r!t±ii _r? 

n - r"-n 

L 3 J + ,4^8 LJ 

ij + ,1,8 L 3 J 


represents the probability of a tie in n games. This probability decreases 
rapidly when n increases, so that in a long series of games a tie is prac¬ 
tically impossible. If the number of games is not limited, the proba¬ 
bilities P, Q, R for A, P, C, respectively, to win are obtained as limits of 
Pn, Qm Rnj when n incr(‘ases indefinitely. Thus 


Problems for Solution 

1. Three urns contiiiri respectively 1 white and 2 black balls; 3 white and 1 black 

ball; 2 white and 3 black balls. One ball is taken from each urn. What is the proba¬ 
bility that among the balls drawn there are 2 white and 1 black? Ans. 

2. Cards ar(‘ drawn one hy one from a full deck. What is the probability that 

" 10 cards will precede the first ace? Ans. = 0.03938. 

3. Urn 1 contains 10 white and 3 black balls; urn 2 contains 3 white and 5 black 
balls. Two balls arc transferred from No. 1 and placed in No. 2 and then one ball is 
taken from the latter. What is the probability that it is a white ball? Ans. 30 • 

4. Two urns identical in appearance contain respectively 3 white and 2 black balls; 

2 white and 5 black balls One urn is selected and a ball taken from it. What is the 
probability that this ball is white? Ans. ^Ho- 

6. What is the probability that 5 tickets drawui in the French lottery all have one¬ 
digit numbers? Ans. J 24416 26 = 29.10 ^ 

* , 6. What is the probability that each of the four players in a bridge game will get a 

(1 • 2 • • • 13)^ 

complete suit of cards? Ans. 24 --- = 4.474.10" 

1 • ^ • o2i 

•V 7 . What is the probability that at least one of the players in a bridge game will 
get a complete suit of cards? 

, 10 • 13! • 39! - 72 - (130=^ • 26! + 72 . (13!)^ 

Ans. -~~52! - * 

See Sec. 5, page 31. 

8. From an urn with a white and b black balls n balls are taken. Find the prob¬ 
ability of drawing at least one white ball. Ans. The required probability can be 
expressed in two ways. First expression: 

_ 6(6 - 1) • ♦ ■ (6 - n d- 1) _ 

(o -h 6)(a + 6 — 1) • • • (a + 6 — n -h 1) 

Second expression; 

6 , , 6(6 - 1) ' • • (6 - n 4- 2) 

' a+b~^ + • • • + (g + j +6-2) • ■ • (a + b -n + 1) 

E)quaUng them, we have an identity 

6 _ 6(6 - 1) ■ • • (6 - w + 2) _ ^ 

^+a+6-l'^’'’’^(o+6 - l)(o +6-2) • ■ ■ (o + 6 - n + 1) 

= ° + _ 6(6 - 1) • • ■ (6 - n + 1) 

^ a L "h b)(fl +6 — 1) • ■ • ((1 + 6 — n + 1) 
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9. Three players B, C in turn draw balls from an urn with 10 white and 10 black 
balls, taking one ball at a time. He who extracts the first white ball wins the game. ♦ 
Supposing that they start in the order O, find the probabilities for each of them 

to win the game. Ans. For A, 0.56584; for 0.29144; for C, 0.14271. 

^ylO. If n dice arc thrown at a time, vvhat is tin* probability of having each of the 
points 1, 2, ... 6, appear at least once? Find the numerical value of this prob- v 
ability for n = 10. Ans. 

Pn - 1 - 6(i)« + 15a)« - 20(-;i)« 4- 15(1)- “ b • (i)- 
pio = 0.2718. 

Hint: Use the formula in Sec. 5, page 31. 

.J/IL In a lottery m tickets are drawn at a time out of the total number of n tickets, 
and returned before the next drawing is made. What is the probability that in k 
drawings each of the numbers 1, 2, ... n will appear at least once? Ans. 



vX 12. We have k varieties of objects, each variety consisting of the same number of 
objects. Tliese objects are drawn one at a time and repla(‘ed bdore the next drawing. 
Find the probability that n and no less drawings will bt‘ required to produce objects of 
all varieties. Ans. ^ _ A 


= (k - l)"-> - ^ * (fe - 2)"-‘ + - --^(A: - 3)"~* - • • • . 

'^13. Three urns contain respectively 1 white, 2 black balls; 2 white, 1 black balls; 
2 white, 2 black balls. One liall is transferred from the first urn into the second; then 
one from the latttT is transferred into the third; finally, one ball is drawn from the 
third urn. What is the probability of its being white? . j4n5. 

iX 14. Each of n urns contains a white and b black balls. One ball is transferred 
from the first urn into the second, then one ball from the latter into the third, and so 
on. Finally, one ball is taken from the last urn. Wliat is the probability of its being 
white? Ans. Denote by p* the probability of drawing a white ball from the kth. um. 
Then 


Pk+i 


a + 1 
a + & +1 


Pk + 


a 

a -f" ~h 1 


(1 - p*) 


for fc = 1, 2, . . . n — 1. Hence, 


a 

15. Two players A and B toss two dice, A starting the game. The game is won 
by A if he casts 6 points before B casts 7 points; and it is won by H if he casts 7 points 
before A casts 6 points. What are the probabilities for A and B to win the game if 
they agree to cast dice not more than n times? What is the probability of a tie? 
Ans. Probability for A: 

Pn - H[1 - if n = 2w 

Pn = if n = 2?w + 1.^= ' 

Probability for B : 


gn = m - 
qn = 11(1 - 


if 

if 


n = 2m 
n — 2m + 1. 
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Probability of a tie: 

Tn = if n — 2m\ Vn = Sidfl)”* if ^ = 2w + 1. 

If n increases indefinitely, fn converges to 0 and Qn converge to the limits 

P = Uf 9 = Ih 

which may be considered as the probabilities for A and B to win if the number of 
throws is unlimited. 

16. The game known as “craps” is played with two dice, and the caster wins 
Mcfnconditionally if he produces 7 or 11 points (which are called “naturals”); he loses 

the game in case of 2, 3, or 12 points (called “craps”). But if he produces 4, 5, 6, 8, 9, 
or 10 points, he has the right to cast the dic(i steadily until he throws the same num¬ 
ber of points he had before or until he throws a 7. If he rolls 7 before obtaining his 
point, he loses the game; otherwise, he wins. What is the probability to win? 

Ans. '^^^95 = 0.493. 

17. Prove directly the identity in Prob. 15, page 37. 

Solution 1. Let 


<p(c, h) 


b 6(5-1) 6(6 - 1)(6 - 2) 
c c(c — 1) c{c — l)(c — 2) 


where 6 is a positive integer and c > 6. Then 


whence 


and in general 


<p(Ct h) — -[1 4- <p{c — 1, 6 — 1)J 
c 


<p{Cy 1) - ^(c, 2) ---; (p(Cf 3) =-- 

c c — 1 c — 2 


v>(c, 6) = - 


c - 6 -f 1 


Taking c = a -f & — 1, we have 

a + h 

1 4- ^(a -I- 6 - 1, 6) = — 
a 

\ Solution 2. The polynomial 


6 6(6 - 1 ) 

S{x) = 1 + - X 4- 4- 

c c{c — 1) 


can be presented in the form of a definite integral 


whence 


Six) = (c + 1 ) J ^'(1 - 


S(l) = (c + 1) 



(1 - = 


c + 1 
c -6 + 1 


o -f 6 
a 


if c = a 4- 6 — 1. 

18. Find the approximate expressions for the probabilities P and Q in Prob. 16, 
page 36, when 6 is a large number. Take for numerical application a « 6 = 60. 
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Solution. Since P -j- C? — 1» it suffices to set;k the approximate expression for 
P Q, Now 


whence 


- 0 = 


To find the approximate expression of this integral, we set 



whence u can be expressed as a power series in v. 

2 4?) + a - 1 12/>2 -f {2h -f- « - D* 

^ == ---- ^,2 ^ - - i ^,3 _ . . . 

26 + a - 1 (26 + a - 1 3(26 -f a - 1)'^ 

Substituting the resulting expression of duldv and integrating with respect to v 
between limits 0 and oo, we obtain for F — Q an asymptotic expansion whose first 
terms arc 


26 -f a - 1 


“ a — 1 a[ri 
- -f -- 

a - 1)2J 


4-(26 + g -- 1)^] (-1)^ 

(26 -I- a - 1)^ 


A more detailed discussion reveals that the error of this approximate' formula is less 


than and greater than 


af40(a - 1 )2 - 66(a ~ 1) + 3262 ] 

( 26 -l-a~l)e 


provided 


6 ^ 12. For a = 6 = 50 the formula yields 

P - Q = 0.3318; P = 0.6659; Q = 0.3341. 
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CHAPTER III 


REPEATED TRIALS 

1. In the theory of probability the word ''trial’' means an attempt to 
produce, in a manner precisely described, an event E which is not certain. 
The outcome of a trial is called a success" if E occurs, and a ^T'ailure" if 
E fails to occur. For instance, if E represents the drawing of two cards 
of the same denomination from a full pack of cards, the ^Hrial" consists 
in taking any two cards from the full pack, and we have a success or 
failure in this trial according to whether both cards are of the same 
denomination or not. 

If trials can be repeated, they form a ^'series" of trials. Regarding 
series of trials, the following two problems naturally arise: 

a. What is the probability of a given number of successes in a given 
series of trials? And as a generalization of this problem: 

b. What is the probability that the number of successes will be 
contained between two given limits in a given series of trials? 

Problems of this kind are among the most important in the theory of 
probability. 

2. Trials are said to be ^independent" in regard to an event E if 
the probability of this event in any trial remains the same, whether 
the results of any number of other trials are known or not. On the other 
hand, trials are dependent" if the probability of .B in a certain trial 
varies according to the information we have about the outcome of one or 
more of the other trials. 

As an example of independent trials, imagine that several times in 
succession we draw one ball from an urn containing white and black balls 
in given proportion, after each trial returning the ball that has been 
drawn, and thoroughly mixing the balls before proceeding to the next 
trial. With respect to the color of the balls taken, we may reasonably 
assume that these trials are independent. On the other hand, if the 
balls already extracted are not returned to the urn, the above described 
trials are no longer independent. To illustrate, suppose that the urn 
from which the balls are drawn, originally contained 2 white and 3 black 
balls, and that 4 balls are drawn. What is the probability that the 
third ball is white? If nothing is known about the color of the three 
other balls, the probability is If we know that the first ball is white, 
but the colors of the second and fourth balls are unknown, this proba¬ 
bility is general, the probability for any ball to be white (or black) 
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depends essentially on the amount of information we possess about the 
color of the other balls. Since the urn contains a limited number of 
balls, series of trials of this kind cannot be continued indefinitely. 

As an example of an indefinite series of dependent trials, suppose that 
we have two urns, the first containing 1 white and 2 black balls, and the 
second, 1 black and 2 white balls, and the trials consist in taking one 
ball at a time from either urn, observing the following rules: (a) the 
first ball is taken from the first urn; {h) after a white ball, the next is 
taken from the first urn; after a black one, the next is taken from, the 
second urn; (c) balls are returned to the same urns from which they were 
taken. 

Following these rules, we evidently have a definite series of trials, 
which can be extended indefinitely, and these trials are dependent. 
For if we know that a certain ball was white or black, the probability 
of the next ball being white is 3 ^^ or ^ 3 , respectively. 

Assuming the independence of trials, the probability of an event E 
may remain constant or may vary from one trial to another. If an 
unbiased coin is tossed several times, we have a series of independent 
trials each with the same probability, 3^, for heads. It is easy to give 
an example of a series of independent trials with variable probability for 
the same event. Imagine, for instance, that we have an unlimited 
number of urns with white and black balls, but that the proportion of 
white and black balls varies from urn to urn. One ball is drawn suc¬ 
cessively from each of these urns. Evidently, here we have a series of 
trials independent in regard to the white color of the ball drawn, but 
with the probability of drawing a ball of this color varying from trial to 
trial. 

Ill this chapter we shall discuss the simplest case of series of inde¬ 
pendent trials with constant probability. They are often called ^^Ber- 
noullian series of trials’^ in honor of Jacob Bernoulli who, in his classical 
book, ^^Ars conjectandi'^ (1713) made a profound study of such series 
and was led to the discovery of one of the most important theorems in 
the theory of probability. 

3. Considering a series of n independent trials in which the probability 
of an event iS is p in every trial (that of the opposite event F being 
^ = 1 — p), the first problem which presents itself is to find the proba¬ 
bility that E will occur exactly m times, where m is one of the numbers 
0 , 1, 2, . . . n. In what follows, we shall denote this probability by Tm- 
In the extreme cases m = n and m = 0 it is easy to find Tn and To. 
When m = Uy the event E must occur n times in succession, so that Tn 
represents the probability of the compound event EEE . . . E with n 
identical components. These components are independent events, since 
the trials are independent, and the probability of each of them is p. 
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Hence, the compound probability is 

= p * p • p • • • p (n times) 
or 

Tn = p". 

The symbol Tq denotes the probability that E will never occur in n 
trials, which is the same as to say that F will occur n times in succession. 
Hence, for the same reasons as before, 

To = g" = (1 - p)^ 

When m is neither 0 nor n, the event consisting in m occurrences of E 
can materialize in several mutually exclusive forms, each of which may 
be represented by a definite succession of rn letters E and n — m letters F. 
For example, if n = 4 and m = 2, we can distinguish the following mutu¬ 
ally exclusive forms corresponding to two occurrences of E: 

EEFFy EFEF, EFFE, FEEF, FEFE, FFEE, 

To find the number of all the different successions consisting of m 
letters E and n — m letters F, we observe that any such succession is 
determined as soon as we know the places occupied by the letter E. 
Now the number of ways to select m places out of the total number of 
n places is evidently the number of combinations out of n objects taken 
m at a time. Hence, the number of mutually exclusive ways to have 
m successes in n trials is 

pm = n{n - 1) ' ' • {n - m + 1) 

1-2 - 3 • • • m 


The probability of each succession of m letters E and n — m letters F, 
by reason of independence of trials, is represented by the product of 
m factors p and n — m factors 9 , and since the product does not depend 
upon the order of factors, this probability will be 


pmqn 


for each succession. Hence, the total probability of m successes in n 
trials is given by this simple formula: 


( 1 ) 


n{n — 1 ) 


(n — m + 1) 


1 • 2*3 


m 




which can also be presented thus: 


( 2 ) 


Tm = 


n 

mi{n — mjr ^ 
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This second form can be used even for m = 0 or m == n if, as usual, 
we assume 0! = 1. Either of the expressions (1) or (2) shows that Tm 
may be considered as the coefficient of in the expansion of 

{q + pO" 

according to ascending powers of an arbitrary variable t. In other 
words, we have identically 

{q + pO" = To + Tit + + ■ ■ ■ + Tnl". 

For this reason the function 


{q + pO” 

is called the “generating function” of probabilities To, Ti, T^, . . . T„. 
By setting ^ = 1 we naturally obtain 

To + Ti + ^2 + • • • + Tn = 1. 

4. The probability PQc, 1) that the number of successes m will satisfy 
the inequalities (or, simply, the probability of these inequalities) 

k ^ m ^ I 

where k and I are two given integers, can easily be found by distinguishing 
the following mutually exclusive events: 

m — k or m = fc + 1, . . . or m = L 

Accordingly, by the theorem of total probability, 

p{k, 1) = T, + - + Ti 

or, using expression (2), 




m^k 


In particular, the probability that the number of successes will not 
be greater than I is represented by the sum 


P(0. f) = g" + jpg-i + + 


+ 


n{n — 1) 


1*2 


(n — i + 1) ^ I 
T—j - 


Similarly, the probability that the number of successes in n trials will 
not be less than I can be presented thus: 
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PQ, n) 


n{n — 1) 


(n - I + 1)^ 


1-2 


I 


r-{ 


piqn-i\ I ^ 


Ip 


+ 


+ 


I 1 q 

(n — l)(n — I — l)fp 
(Z+l)(Z + 2) \q 




where the series in the brackets ends by itself. 

6 . The application of the above established formulas to numerical 
examples does not present any difficulty so long as the numbers with 
which we have to deal are not large. 


• Example 1. In tossing 10 coins, what is the probability of having exactly 5 heads? 
Tossing 10 different coins at once is the same thing as tossing one coin 10 times, if all 
the coins are unbiased, which is assumed. Hence, the required probability is given 
by formula (1), where we must take n = 10, w == 5, p = g = 3^ and it is 


10 • 9 • 8 • 7 • 6 1 

T~ 2 • 3 • 4 • 5 ’ 2^ 


252 

1024 


- 0.24609. 


Example 2. If a person playing a certain game can win $1 with the probability 
I'i, and lose twenty-five cents with the probability %, what is the probability of win¬ 
ning at least $3 in 20 games? Let m be the number of times the game is won. The 
total gain (considering a loss as a negative gain) will be 

m — ^(20 — m) — |m — 5 dollars 

and the condition of the problem requires that it should not be less than $3. Hence 

— 5 ^ 3, 

whence m ^ 6% or, since m is an integer, m ^ 7. That is, in 20 trials an event with 
the probability must happen at least 7 times and the probability for that is: 


20 


201 


\m\i20 - m)l 


This sum contains 14 terms; but it can be expressed through another sum containing 
only 7 terms, because 

20 6 

2 20 ! 201 /iy/ 2 Y°“” 

?n!(20 - m)!\3/ \3/ ~ .^m!(20 - 7n)!\3/ \3/ 

m=7 T7»«-0 

Using the last expression, one easily gets 0.5207 for the required probability. 

In the series of probabilities 

To, Tif T^y , . • Tn 

for 0, 1, 2, ... n successes in n trials, the terms generally increase till 
the greatest term is reached, and then they steadily decrease. For 
instance, if n = 10, p = g = the values of the expression 


for m = 0, 1, 2, ... 10 are 
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1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1 
so that Tb is the greatest term. For obvious reasons the number /x (to 
which the greatest term in the series of probabilities To, Ti, . . . Tn 
corresponds) is called the ‘^most probable^^ number of successes. 

To prove this observation in general, and to find the rule for obtaining 
M, we observe first that the quotient 


Tm-fi ^ n — m p 
Tm m + 1 q 


decreases with increasing m, so that 


(o) 



> 


T„ 

Tn-l 


The two extreme terms in (a) are 

Tji ^np _T^ _ ^ 

To q ' Tn-i nq 


and if n is large enough, the first of them is > 1 and the last < 1 . 
find exactly how large n must be, we notice that 


if 

whence 

Similarly, 

if 

whence 


i 0 


np > q = 1 — p 


n + I > -• 
V 


Tn 


< 1 


Tn-1 

p < nq or \ — q < nq 

n + 1 > i' 

Q 


To 


Consequently, if n + 1 is greater than both 1/p and 1/g, the first term 
in (a) is > 1 and the last term is < 1. As the terms of (a) form a decreas¬ 
ing sequence, there must be a last term which is ^ 1 . Let it be 





^ 1 


Then 
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and 


1 > 


<(■41 




> 


Tn 

Tn-l 


or, which is the same, 

To < Ti < T2 < • • • < ^ 

> • • • > r,. 


In other words, the sequence of probabilities increases till the greatest 
term is reached and steadily decreases from then on. Besides T^, 
there may be another greatest term T^_i; namely, when T^..i = 
but all the other terms are certainly less than T^. The number is 
perfectly determined by the conditions 


V M + lq 


which are equivalent to the two inequalities 


(n + l)p ^ Mp + q)y np - q < m(p + q). 


These in turn can be presented thus: 


M ^ (n + l)p < M + 1 

and show that p is uniquely determined as the greatest integer contained in 
(n + l)p. If (n + l)p is an integer, then p — {n + l)p and Ty. = 

That is, there are two greatest terms if, and only if, (n + l)p is an 
integer. 

Let us consider now what happens if 


n + 1 ^ - 
V 


or 


n + 1 < -• 

q 


In the first case, all the terms in (a) are less than 1 with the single excep¬ 
tion of the first term TxjTo which may be equal to 1; namely, when 


n + 1 == t 
V 


Consequently, 


To > Ti > Tz > • • > Tn 


so that To is the greatest term. If (n + l)p < 1 the greatest integer 
contained in (n + l)p is 0, and there is only one greatest term To. If, 
however, (n + l)p = I, there are two terms To == Ti greater than 
others. 

If (n + 1)^ ^ 1, all the terms in series (a) are > 1 with the exception 
of the last term, which may be equal to 1; namely, when {n + l)g = 1. 
Hence, 


To < Ti < . . . < Tn^i g T, 
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so that Tn is the greatest term, and the preceding term Tn-i can be equal 
to it only if (n + 1)^ = 1. Now the condition 


is equivalent to 


(n + l)g ^ 1 


(n “h l)p ^ n. 

On the other hand, because p < 1, 

(n + l)p < n + 1. 


Therefore n is the greatest integer contained in (n + \)p. 

Comparing the results obtained in the last two cases (excluded at 
first) with the general rule, we see that in all cases the greatest term 
corresponds to 

M = [(n + l)p]. 

- \ 

If (n + l)p is an integer, then there are two greatest terms and 
This rule for determining the most probable number of successes is very 
simple and easy of application to numerical examples. 


Example 1. Letn = 20, p = ^ Then (n + l)p = 8.4, and the greatest 

integer contained in this number is m = 8. Hence, there is only one most probable 
number of successes m == 8 with the corresponding probability 


2Q!/2Y/3Y^ 

' 8!12!V5/ \5/ 


0.1797. 


Example 2. Let n = 110, p == and (n + l)p - 37, an integer. 

Consequently, 36 and 37 are the most probable numbers of successes with the corre¬ 
sponding probability 


Tse — T37 


110! /1YY2Y’ 

37!73!\3/ \3/ 


- 0.0801. 


7. When n, m, and n — m are large numbers, the evaluation of 
probability Tm by the exact formula 

becomes impracticable and it is necessary to resort to approximations. 
For approximate evaluation of large factorials we possess precious means 
in the famous ‘‘Stirling formula.’^ Referring the reader to Appendix I 
where this formula is established, we shall use it here in the following 
form: 

log x\ = logV" 2tx + X log X — X + w(x) 


1 

I2{x + i) 


< 'o){x) < 


12x 


where 
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In the same appendix the following double inequality is proved: 


1 _ 

i2n \2m 


< o){n) — w(m) — o){l) < 


1 


12Z 


I2n + 6 12m + 6 


1 


12Z + 6 


Now from Stirling's formula 

n\ = \/27m 

and two similar expressions for m! and (n — m)\ follow. Substituting 
them into Tm, we get two limits 


(3) 

(4) 

where 


Tm < k. 


T m "> I'K n 


I 

\^ 2 t^ 




27rm(n — m)\m 


_:_ 

2Tm{n — m)\7n 


/ \n — m/ 

)‘(^r 

/ \n — m/ 


]c = < 5 l 2 n-f 6 12 m-H 6 12 (n-m)-f 6 

J_1_ 1 

I = gl2n 12m 12(n—m) 


When riy n — m are even moderately large k and I differ little from 
each other. 

Inequalities (3) and (4) then give very close upper and lower limits 
for Tm- To evaluate powers 

Y 

\m/ ' \n — m) 

with large exponents, sufficiently extensive logarithmic tables must be 
available. If such tables are lacking, then in cases which ordinarily 
occur when ratios u'p/m and nqjin — m) are close to 1, we can use 
special short tables to evaluate logarithms of these ratios or else resort to 
series. 

8 . Another problem requiring the probability that the number of 
successes will be contained between two given limits is much more 
complex in case the number of trials as well as the difference between 
given limits is a large number. Ordinarily for approximate evaluation 
of probability under such circumstances simple and convenient formulas 
are used. These formulas are derived in Chap. VII. Less known is 
the ingenious use by Markoff of continued fractions for that purpose. 

It suffices to devise a method for approximate evaluation of the 
probability that the number of successes will be greater than a given 
integer i which can be supposed > np. We shall denote this probability by 
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Pi]). A similar notation Qi]) will be used to denote the probability 
that the number of failures is >l where again I > nq. The probability 
P{kj 1) of the inequalities k ^ m ^ I can be expressed as follows; 

P(/c, Z) = 1 - P{1) - Q{n - k) 

if Z > wp and k < np; 


P(k, 1) = Pik - 1) - P(Z) 
if both k and Z are > np\ and finally 

PiK Z) = Q(n - Z - 1) - Q{n - k) 

if both k and Z are < np. 

For P(Z) we have the expression 


PiD 




- I - l)(n - i - 2 )/pV , . 1 

a + 2)a + 3) \qj J 


1 + p + 

" ^ 1 + 2 


The first factor 


n\ 


Q +l)lin-I- 1) !^ g 


can be approximately evaluated by the method of the preceding section. 
The whole difficulty resides in the evaluation of the sum 


^ = 1 + 


-- Z — 1 p , (n — Z — l)(7^ — Z — 2) 
Z + 2 q (Z + 2)(Z + 3) 




+ 


which is a particular case of the hypergeometric series 


F(a B y x) = l+-^X + «(« + + 1) 

tia,B,y,X) 1 -h j -h j _ 2.y(.y + 1) 


X* + 


In fact 


<- 


71 -f" Z ly Z -}- 2, 




Now, owing to this connection between S and hypergeometric series, S 
can be represented in the form of a continued fraction. First, it is 
easy to establish the following relations: 

P(a, ^ + ly y + 1, x) - F(a, 0, y, x) + 

+ ^^^f(« + hfi + l.y + 2.x) 

F(a + 1, 7 + 1, x) = F(a, Bt y, x) + 

fl(7 — a), 


+ X 


y(y + 1) 


F(a + 1, ^ + 1, 7 + 2, X), 
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Substituting a P + y + 2n and a + 0 + n + 1, y + 2n + 1, 

respectively, for a, /S, y in these relations and setting 

X2n = F{a + n, ^ + n, 7 + 2n, x); 

X 2 n+i = F(a + n, + n + 1, 7 + 2/1 + 1, x) 

— + ^)(7 — g + n) ^ ^ _ {a + n)(y — P + n) 


a^n 

for brevity, we have 


(7 + 2 n )(7 + 2 n — 1 )' (7 + 2 n )(7 + 2n + 1 ) 


Xq — X\ dxxX^ 
Xi X2 — CL^xX^ 


Xfn —1 X m dmXX 


whence 


m+l 


Xo 


diX 

X 


diX 

X - 


dm^\X 

”X~ 


_ ClmX 
Xm 
Xm-^l 

Xo = 1 


In our particular case 

Xi = F{ — n + Z + 1 , 1 , I + 2, x), 
and (i 2 n— 2 i—i “ 0 . 

j) 

Hence, taking ^ introducing new notations, we have a 

finite continued fraction 
1 


(5) 


S = 


1 - V dj 

1 + ^ Cj 

^ " 1 + . 


Cn— l—i 


1 + 


dn—l—1 


where 

( 6 ) c. 


(n — A; — Z)(Z + k)p 
(Z + 2/c - 1 )(Z + 2/c)g^ 


dk = 


k{n + k)p 


(I + 2k) (I -|~ 2k + 1 )^ 


Every one of the numbers Ck will be positive and < 1 if this is true for 
Cl. Now 


Cl = 


(n — I — l)p 


< 1 


{I + 2)q 

if Z > np, and that is exactly what we suppose. The above continued 
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fraction can be used to obtain approximate values of S in excess or in 
defect, as we please. Let us denote the continued fraction 

Ck 1 
T I 

t "T Y _ Ck+1 

1 + • 


by o)*. Then 


0 < ojfc < Cfr, 


which can be easily verified. Furthermore, 


S = 





Cl 

1 + 



W 2 


1 + 


do 


1 — Ws 


and in general 


Wife = y _afc_ 

1 0>k+l 

Having selected fc, depending on the degree of approximation wc 
desire in the final result (but never too large; A: = 5 or less generally 
suffices), we use the inequality 


0 < < Ck+\ 

to obtain two limits in defect and in excess for o)k- Using these limits, we 
obtain similar limits for m-ij m-z, coa_ 3 , . - * and, finally, for coi and >S. 
The series of operations will be better illustrated by an example. 

9. Let us find approximately the probability that in 9,000 trials an 
event with the probability p — will occur not more than 3,090 times 
and not less than 2,910 times. To this end we must first seek the 
probability of more than 3,090 occurrences, which involves, in the first 
place, the evaluation of 

^ _ 9000! 

i 309i - 3091! 5909IW \3/ ‘ 

By using inequalities (3) and (4) of Sec. 7, we find 
0.011286 < Tso9i < 0.011287. 


Next we turn to the continued fraction to evaluate the sum S. The 
following table gives approximate values of ci, C 2 , . . . Cq and di, ^2 . . . de 
to 5 decimals and less than the exact numbers 
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n 

Cn 

dn 

1 

0.95553 

0.00047 

2 

0.95444 

0.00094 

3 

0.95335 

0.00140 

4 

0.95227 

0.00187 

5 

0.95119 

0.00234 

6 

0.95010 



We start with the inequalities 

0 < 0)6 < 0.95011 


and then proceed as follows: 


1.00234 < 
1.02041 < 
1.01716 < 
1.01416 < 
1.00785 < 


1 + ~~ < 1.04711; 

1 — 0)6 

1 + , < 1.03685; 

1—0)5 

1 + , ■ < 1.02113; 

1—0)4 

1 + —h— < 1.01514; 

1—0)3 

1 + < 1.00816; 
1 — 0)2 


0.05221 

0.02161 < )ST309i 


0.90839 < 0)6 < 0.94898 

0.91842 < 0)4 < 0.93324 

0.93362 < 0)8 < 0.93728 

0.94020 < 0)2 < 0.94113 

0.94779 < 0)1 < 0.94810 


0.05190 
< 0.02175. 


Hence, we know for certain that 

0.02161 < P(3,090) < 0.02175. 

By a similar calculation it was found that 

0.02129 < Q(6,090) < 0.02142^ 

so that 

0.04290 < P(3,090) + Q(6,090) < 0.04317. 

The required probability P that the number of successes will be contained 
between 2,910 and 3,090 (limits included) lies between 0.95683 and 
0.95710 so that, taking P = 0.9570, the error in absolute value will be 
less than 1.7 X 10“'^. 


Problems for Solution 

1. What is the probabiUty of having 12 three times in 100 tosses of 2 dice? 

Am. = 0.2257. 
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2. What is the probability for an event E to occur at least onco, or twice, or three 
times, in a series of n independent trials with the probability pi 4ns. 

(a) 1 - (1 - p)% (b) 1 - (1 - p)»->[l + (n - l)p]; 

(c) 1 - (1 - 1 + (n - 2)p + 

3. What is the probability of having 12 points with 2 dice at least three times in 

100 throws? Ans. 0.528. 

4. In a series of 100 independent trials with the probability V 3 , what is the; most 
jjrobable number of successes and its probability? Ans. ^ = 33; Tss = 0,0844. 

Note: Log 100! = 157.97000; Log 67! - 94.56195; Log 33! = 36.93869. 

5. A player wins $1 if he throws heads two times in succ(‘ssion; otherwise he loses 
25 cents. If this game is repeated 100 times, what is the probability that neither his 
gain nor loss will exceed $1? Or $5? Afis. 


20! 801X4> 


- 0.0493; 


20!80!\4> 


^ 80 80 • 79 80 • 79 • 78 80 • 79 • 78 • 77 

_ 63 63 • 66 63 - 6^)9 63” 66 • 69 ”72 

^60 60 ■ 57 60 • 57 • 54 60 • 57 • 54 • 511 

^ 81 81 -82 81 :^”83 81 • 82 • 83 • S J ” ® ' 


Note: Log 20! = 18.38612; Log 80! = 118.8.5473. 

6. Show that in a series of 2,s' trials with the prolmbility Yz tlie most probable num¬ 
ber of successes is « and the corresponding probability 


Show also that 


\/ 2 « + 1 


2 • 4 • 6 • • 2s 

y <-- 

3 • 5 • 7 • • • (2s -f 1) 

7. Prove the following theorem: If P and P' are probabilities of the most probable 
number of successes, respectively, in n and n + 1 trials, then P' ^ P, the equality 
sign being excluded unless (n + l)p is an integer. 

8 . Show that the probability corres})oiiding to the most probable number of 
successes in n trials, is asymptotic to (27rnp^)"K', that is, 

lim Pm\/ 27r7ipq' = 1 as n <x>. 

9. When p = Vz, the following inequality holds for every m: 


if 
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10. What is the probability of 215 successes in 1,000 trials if p = J4? 

Am. 0.0154. 

11. What is the probability that in 2,000 trials the number of successes will be 

contained between 460 and 540 (limits included) if p = Am. 0.964. 

f Two players A and B agree to play until one of them wins a certain number of 

games, the probabilities for A and B to win a single game being p and g — 1 — p. 
However, they are forced to quit when A has a games still to win, and B has h games. 
How should they divide their total stake to be fair? 

This problem is known as '‘probl^me de parties,one of the first problems on 
probability discussed and solved by Fermat and Pascal in their correspondence. 

Solution 1. Let P denote the probability that A will win a remaining games before 
B can win b games, and let Q = 1 — P denote the probability for B to win h games 
before A wins a games. To be fair, the players must divide their common stake M in 
the ratio P:Q and leave the sum MP to A and the sum MQ to B. 

To find P, notice that A wins in the following mutually exclusive ways: 

a. If he wins in exactly a games; probability p". 

a 

b. If he wins in exactly a I games; probability -p"^. 


c. If he wins in exactly a + 2 games; probability 


a{a -f 1) 

1 • 2 




n. If he wins in exactly a + 6 — 1 games; probability 
a(a + 1) • • • (a + - 2) 


1-2 S 


{b - 1 ) 


Consequently 
P = p 
and similarly 

Q 


a a{a + 1) 
1 + - -- - 


= 9^ 


b h{h + 1) 

1 + TP + -V^-'P’ 


+ • 


+ 


CE(fl T" 1) 


(a+ b - 2) 


1 • 2 


_l_6(6 + 1) 


(b - 1) 


(h + a - 2) 


1 ■ 2 


(o - 1) 




1 -2 

Show directly that P + Q = 1. 

dP dQ ^ 

Hint: -—h — = 0. 
dp dp 

Solution 2. The same problem can be solved in a different way. Whether .4 or jB 
wins will be decided in not more than a + 5 — 1 games. Now if the players continue 
to play until the number of games reaches the limit o -f 6 — 1, the number of games 
won by A must be not less than a. And conversely, if this number is not less than a, A 
will win a games before B wins b games. Therefore, P is the probability that in 
a + 6 — 1 games A wins not less than a times, or 


(a + 5 - 1)! 

o !(6 - 1 )! 


paqb~i\ 1 -{_ 


a + 1 g 


lp _^(6 - 1)(5 - 2 ) 


(a + l)(a + 2)' 

(5 ^ i )_(6 ^ 2 ) 


+ ••• + 
. . . 2 • 1 


(a + l)(a-f 2) 

Show directly that both expressions for P are identical. 


(a + 6 - 1 ) 


1- 





Hint; Proceed as before. 
13. Prove the identity 
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. n , , n(ra — 1) 

p" + jP""‘? d-p"-^* + 


n(n — 1) 


(n - A: + 1) 


1.2-3 • • • * 


X '- 

X"^ 


n-*-in _ 


(1 — x)^dx 


■ n-fc-lM _ 


(1 — x)^dx 

JU 

IfiNT*. Take derivatives with respect to p. 

^ 1£: ^ and B have, respectively, n + 1 and n coins. If they toss their coins 
simultaneously, what is the probability that (a) A will have more heads than B? 
(6) A and B will have an equal number of heads? (c) B will have more heads than A ? 

Solution, a. Let Pn be the probability for A to have more heads than B. This 
probability can be expressed as the double sum 

n -f 1 n 

a^ = l a==0 

Considering the coefficient of P in 


we have 


Hence 


n 


n + 1 


_ A'” -1 

z = 1 

h. The probability Qn for A and B to have an equal number of heads is 

n 

n — ^ pot pa __ ^2n+l ^ ^ 

a = 0 

c. The probability Rn for B to have more heads than A is 

1 

P _ f _ ^2n-f l 

” 2 22”+i 


15. If each of n independent trials can result in one of the m incompatible events 
Ely J& 2 , . . . Em with the respective probabilities 

pi, P2, . . . Pm; (Pl H- P2 + ' • • + pm = 1), 

show that the probability to have h events Ei, h events J^ 2 , • • > Im events Em where 
Zi + -f • • • + = w, is given by 
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PROBABILITIES OF HYPOTHESES AND BAYES» THEOREM 

1. The nature of the problems with which we deal in this chapter may 
be illustrated by the following simple example: Urns 1 and 2 contain, 
respectively, 2 white and 3 black balls, and 4 white and 1 black balls. 
One of the urns is selec^ted at random and one ball is drawn. It happens 
to be white. What is the probability that it came from the first urn? 
Before the ball was drawn and its color revealed, the probability that the 
first urn would be chosen had been 1/2; but the indication of the color 
of the ball that was drawn altered this probability. To find this new 
probability, the following artifice can be used: 

Imagine that balls from both urns are put together in a third urn. 
To distinguish their origin, balls from the first urn are marked with 1 
and those from the se(^ond urn are marked with 2. Sin(;e there arc 5 
balls marked with 1 and the same number marked with 2, in taking one 
ball from the third urn we have equal chances to take one coming from 
either the first or the second urn, and the situation is exactly the same 
as if we chose one of the urns at random and drew one ball from it. 
If the ball drawn from the third urn happens to be white, this can happen 
in 2 + 4 = 6 equally likely cases. Only in 2 of these cases will the 
extracted ball have the mark 1. Hence, the probability that the white 
ball came from the first urn is 

The success of this artifice depends on the equality of the number of 
balls in both urns. It can be applied to the case of an unequal number 
of balls in the urns, but with some modifierations; however, it seems 
preferable to follow a regular method for solving problems like the 
preceding one. 

2. The problem just solved is a particular case of the following funda¬ 
mental : 

Problem 1. An event A can occur only if one of the set of exhaustive 
and incompatible events 


Ri, R2, . . . B, 

occurs. The probabilities of these events 


(B{), (B,), . . . (B„) 

corresponding to the total absence of any knowledge as to the occurrence 

60 
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or noiioccurrence of A^ are known. Known also, are the conditional 
probabilities 

(A, B,); z = 1, 2, . . . n 

for A to occur, assuming the occurrence of Bi. Plow does the proba¬ 
bility of Bi change with the additional information that A has actually 
happened? 

Solution. The question amounts to finding the conditional proba¬ 
bility (/it, A). Th(‘ probability of the compound event ABi can be 
presented in two forms 

{ABi) = {Bi){A, Bi) 
or 

(ABi) = {A){Bi, A). 

Equating the right-hand members, we derive the following expression 
for the unknown probability (/if. A): 

^ ’ {A) 

Since the event A can materialize in the mutually exclusive forms 

AB,, AB 2 , . . . A Bn, . 

by applying the theorem of total probability, we get 

(A) = (B0(A, Bi) + {B,){A, BO + • • • + (B0(A, B^. 

It suffices now to introduce this expression into the preceding formula for 
(Bi, A) to get the final expression 

.0 .X __ (li MA, Bd _ 

u; Kiii, ^ 1 ; +TB2KA, b„) +“■ • ■ + (/?„)U77?„)' 

This formula, when desc'ribed in words, constitutes the so-called 
^‘Bayes^ theorem/’ However, it is hardly necessary to describe its 
content in words; symbols speak better for themselves. For that 
reason, we prefer to speak of Bayes' formula rather than of Bayes’ 
theorem. Bayes’ formula is also known as the ^Tormula for probabilities 
of hypotheses.” The reason for that name is that the events Bi, B 2 , . . . 
Bn may be considered as hypotheses to account for the occurrence of A. 
It is customary to speak of probabilities 

(BO, (BO, . . . (BO 

as a priori probabilities of hypotheses 

Bi, B2, . . . Bn, 


while probabilities 


(Bi, A); f = 1, 2, . . . n 
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are called a posteriori probabilities of the same hypotheses. 

3. A few examples will help us to understand the meaning and the 
use of Bayes’ formula. 

Example 1. The contents of urns 1, 2, 3, are as follows: 

1 white, 2 black, 3 red balls 

2 white, 1 black, 1 red balls 
4 white, 5 black, 3 red balls 

One urn is chosen at random and two balls drawn. They happen to be white and red. 
What is the probaV)ility that they came from urn 2 or 3? 

Solution. The event A represents the fact that two balls taken from the selected 
urn were of white and rod color, respectively. To account for this fact, we have three 
hypotheses: The seh'cted urn was 1 or 2 or 3. We shall reprc’sent these hypotheses in 
the order indicated by Biy B^y B^. Since nothing distinguishes the urns, the probabili¬ 
ties of these hypotheses before anything was known about A are 


(BO = (BO = (BO - I 


The probabilities of A, assuming these hypotheses, arc 

{Ay BO - J, {Ay BO = \y (A, BO = A. 


It remains now to introduce these values into formula (1) to have a posteriori prob¬ 
abilities 


{B,y A) - 
{B,yA) = 


_i J_ ^ ^ 

i-i + l-i-f 118 

___ I VV_^ ^ 

i ] -f \ -f i-* 118 


and also, naturally, 


(B„ = 1 - {B,y A) - {B,y A) - 


'^Example 2. It is known that an urn containing altogether 10 balls was filled in 
the following manner: A coin was tossed 10 times, and according as it showed heads 
or tails, one white or one black ball was put into the um. Balls are drawn from this 
urn one at a time, 10 times in succession (always being returned before the next draw¬ 
ing) and every one turns out to be wlxite. What is the probability that the um con¬ 
tains nothing but white balls? 

Solution. The event A consists in the fact that in 10 independent trials with a 
definite but unknown probability, only white balls appear. To account for this fact, 
we have 10 hypotheses regarding the number of white balls in the urn; namely, that 
this number is either 1, or 2, or 3, . . .or 10. The a priori probability of the hypo¬ 
thesis Bi that there are exactly i white balls in tfi^ urn, according to the manner in 
which the urn was filled, is the same as th^probabilify of having i heads in 10 throws; 
that is, 


(Bi) = 


10! 1 
t!(10 - t)!2io^ 


f = 1, 2, . . . 10. 


Granted the hypothesis B,*, the probability of A is 
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The problem requires us to find (Bio, A). The expression of this probability immedi¬ 
ately results from Bayes’ formula: 


(Bio, A) 



The denominator of this fraction is 


Hence 


14.247. 

(Bio, A) = 0.0702. 


This probability, although still small, is much greater than Ho24> a priori prob¬ 
ability of having only white balls in the urn. 

If, instead of 10 drawings, m drawings have been made and at each drawing white 
balls appeared, the probability (Bio, A) would be given by 


(Bio, A) 


1 

10 

2<fo)' 


The denominator of this formula can be presented thus: 


Now 


and so 


Hence 






10 

V . 


This shows that with increasing m the probability (Bio, A) rapidly approaches 1. 
For instance, if m = 100 

(Bio, A) > (1 + > (1.0000454)-io > 0.99954. 

Thus, after 100 drawings producing only white balls, it is almost certain that the 
urn contains nothing but white balls—a conclusion which mere common sense would 
dictate. 

4^ Exam ple 8 . Two urns, 1 and 2, contain respectively 2 white and 1 black ball, 
a^ Twhite and 5 black balls. One ball is transferred from urn 1 to urn 2 and then 
one ball is drawn from the latter. It happens to be white. What is the probability 
that the transferred ball was black? 
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Solution. Here we have two hypotheses: Bi, that the transferred ball was black, 
and /^ 2 , that it was white. The a priori probabilities of these hypotheses are 

(BO = L (BO = 5* 

The probabilities of drawing a white ball from urn 2, granted that B\ ov B^ is true, 
are: 

{A, BO = h, (^4, BO - r 

The probability of Bi, after a white ball has been drawn from the second urn, 
results from Bayes’ formula: 

jL 2. Retaining the notations, conditions, and data of 

Proh. 1, find the probability of materialization of another event C 
granted that A has actually occurred. Conditional probabilities 

(C, ABi)\ t = 1, 2, . . . n 

are supposed to be known. 

Solution. Since the fact of the occurrence of A involves that of one, 
and only one, of the events 

Ri, R2, • . • 

the event C (granted the occurrence of A) (‘an materialize in the following 
mutually exclusive forms 

C/ii, Clh, . . . GB„. 

Consequently, the probability (C, A) which we are seeking is given by 
(C, A) = (CB,, A) + (CR2, ^) + ■ • • + {CBn, A). 

Applying the theorem of compound probability, we have 
(CR,, A) = {Bi, A){C, BiA) 

and 

(C, A) = (Ri, A)(C, ABO + (R2, A){C, ABO + • * * + 

(Rn, A){C, ARn). 

It suffices now to substitute for 

{Bi, A) 

its expression given by Bayes^ formula, to find the final expression 

X{B.){A, Bi){C, AB.) 

(C, A) = - 

XiBiXA, BO 


( 2 ) 
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It may happen that the materialization of hypothesis Bi makes C 
independent of A ; then we have simply 

(r, ABd =- {C, B,) 

and instead of formula (2), we have a simplified formula 


B,) 

(3) (C, .4) ^ - - 

X(«.)(^, Bi) 

- i-^\ 

The event C (;an be considered in n^^ard to A as di future event. P^or 
that reason formulas (2) and (3) express ])rol>abilities of future events. 
For better understanding of these commonly used technical terms, we 
shall (H)nsider a sim))le (‘xainide. 

• E^mple 4, From an urn conUiining; 8 white and 5 black balls, 4 balls are trans¬ 
ferred into an empty urn. From this urn 2 balls are taken and they both happen to 
be white. Wliat is the probability that the third ball taken from the same urn, will 
be white? 

Solution, (a) Let us suppose that the two balls drawn in the first place are returned 
to the second urn. Analyzing this problem, we distinguish first the following hypoth" 
eses concerning colors of tlic 4 balls transferred from the first urn. Among them, there 
are necessarily 2 white balls. Hence, there are only two possible hypotheses: 

Bii 2 white and 2 black balls; 

/L: 3 white and I black ball. 


A priori probabilitic^s of these hypotheses arc 


(Bi) - 


(B,) 



3 

?' 

J_ 

Ti* 


The event A consists in th(‘ white color of both balls drawn from the sec^ond urn 
The conditional probabiliti(‘s {A, Bi) and (A, Bi) are 

(.4, B.) = i; (A, Ih) - 2. ' 

The future event C consists in the white color of the third ball. Since the 2 balls 
drawn at first are returned, C becomes independent of A as soon as it is known which 
one of the hypotheses has materialized. Hence 

(C, ABy) = (C, Bi) = ^ 

(C, ABi) = (C, Bi) = I 


Substituting these various niindx^rs in fornnda (3), we find that 


(c! A) 


1111IJLA.J -i = 1 

I • i -h • I 12 
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{h) If the two balls drawn in the first place are not returned, we have 


(C, AB{) = 0, (C, AB,) = i- 


Then, making use of formula (2), 


(C, A) = 


A • I ■ i 

f-i + A-J 6 ‘ 


6. The following problem can easily be solved by direct application 
of Bayes^ formula. 

Problem 3. A series of trials is performed, which, with certain 
additional data, would appear as independent trials in regard to an event 
E with a constant probability p. 

Lacking these data, all we know is that the unknown probability p 
must be one of the numbers 


Vh P2, . . . Pk 

and we can assume these values with the respective probabilities 

ai, . . . oLk. 

In n trials the event E actually occurred m times. What is the proba¬ 
bility that p lies between the two given limits a and i^(0gQ:<iSgl), 
or else, what is the probability of the following inequalities: 

a ^ p ^ 

A particular case may illustrate the meaning of this problem. In a 
set of N urns, Nai urns have white balls in proportion pi to the total 
number of balls; Na^ urns have white balls in proportion P 2 ; . . . Nak 
urns have white balls in proportion pk. An urn is chosen at random and 
n drawings of one ball at a time are performed, the ball being returned 
each time before the next drawing so as to keep a constant proportion 
of white balls. It is found that altogether m white balls have appeared. 
What is the probability that one of the Nai urns with the proportion 
Pi of white balls was chosen? Evidently this is a particular case of the 
general problem, and here we possess knowledge of the necessary data, 
provided that the probability of selecting any one of the urns is the same. 

Solution. We distinguish k exhaustive and mutually exclusive 
hypotheses that the unknown probability is pi, or p 2 , . . . or pk. The 
a priori probabilities of these hypotheses are, respectively, ai, a 2 , . . . a*. 
Assuming the hypothesis p = p*, the probability of the event E occurring 
m times in n trials is 

C?p?(l - 

Now, after E has actually happened m times in n trials, the a pos¬ 
teriori probability of the hypothesis p = p<, by virtue of Bayes' formula, 
will be 
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or, canceling C”, 


Cya,-p?(l - 

k 

5) - Pi)""”* 

1 = 1 


a<p7(l - p.)"-” 

Jb 

2^aiP7(l — p<)”~’" 

1 = 1 


Now, applying the theorem of total probability, the probability P of the 
inequalities 


will be given by 


a ^ p S ^ 


^ _ Xaip^il - piY~-^ 
^ "k 

]£ aip^(\ *- piY-^ 


where the summation in the numerator refers to all values of pi lying 
between a and /?, limits included. 

An important particular case arises when the set of hypothetical 
probabilities is 

Pi == P 2 =y — ' Pk=- I 

and the a priori probabilities of these hypotheses are equal: 

1 

ai = a2 = ’ ' • = OLk = 


Then the fraction l/h can be canceled in both numerator and denomina¬ 
tor. The final formula for the probability of the inequalities 


will be 
(5) 


Oi ^ P ^ P 

n _ 2p?(l - pO”-” 

^ " k 

^p 7(1 - p,)"-”* 


summation in numerator being extended over all positive integers i 
satisfying the inequalities 

ka ^ i S kfi. 

In the limit, when k tends to infinity, the a priori probability of the 
inequalities 

a ^ p ^ P 
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is given simply by the length ^ — a of the interval (a, 0), The a pos¬ 
teriori probability of the same inequalities is obtained as the limit of 
expression (5). Now, as A;—> oo^ the sums 





tend to the definite integrals 

and — xY~^*dx. 

Therefore, in the limit, the a posteriori probability of the inequalities 

cx ^ p ^ 

is expressed by the ratio of two definite integrals 



This formula leads to the following conclusion: When the unknown 
probability p of an event E may have any value between 0 and 1 and, the a 
priori probability of its being contained between limits a and (3 is ^ — a, 
then after n trials in which E occurred m times, the a posteriori probability 
of p being contamed between a and ^ is given by formula (6). 

6. Problem 4. Assumptions and data being the same as in Prob. 3, 
find the probability that in ni trials, following n trials, which ijroduced 
E m times, the same event will occur m.\ times. 

Solution. It suffices to take in formula (3) 


(Bi) = a^; U, Bi) = C;'‘pr(l - PO"-”* 

and 

{C, Bi) = C’DivT'O - Pi)”--”' 

to find for the required probability this expression: 

k 

(7) -- 

2a.pr(l - Pi)”""* 

Supposing again 


1 

2 


II 


• • ■ Pt = 1 

1 

ai == 

az — • ■ • 

= “* = ik 
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and letting A; —» oo formula (7) in the limit becomes 


( 8 ) 




Q-c 




This formula leads to the following conclusion: When the unknown 
'probability p of a?i event E may have any value between limits 0 and 1 
and the a priori probability of its being contained between a and fi is 
P — a (so that equal probabilities correspond to Ddervals of equal length) ^ 
the probability that the event E will happen mi times in ni trials following 
n trials which produced E m times is given by formula (8). 

In particular, for ni = mi = 1 (evaluating integrals by the known 
formula), we have 


Q = 


m + 1 
n + 2' 


This is the much disputed ^Maw of succession’^ established by Laplace. 

7. Bayes’ formula, and other conclusions derived from it, are neces¬ 
sary consequence's of fundauKuital concepts and theorems of the theory of 
probability. Once we admit these fundamentals, we must admit Bayes’ 
formula and all that follows from it. 

But the question arises: When may the various results established 
in this chapter be legitimately applied? In general, they may be applied 
whenever all the conditions of their validity are fulfilled; and in some 
artificial theoretical problems like those considered in this chapter, they 
unquestionably are legitimately ap})lied. But in the case of practical 
applications it is not easy to make sure that all the conditions of validity 
are fulfilled, though therc^ are som(‘ practical problems in which the use 
of Bayes’ formula is peilectly legitimate.^ In the history of probability 
it has happened that even the most illustrious men, like Laplace and 
Poisson, went farther than they wcue (‘iititled to go and made free use 
principally of formulas ((>) and (8) in various important practical prob¬ 
lems. Against the indiscriminate use of these formulas sharp objections 
have been raised by a number of authors, especially in modern times. 

The first objection is of a general nature and hits the very existence 
of a priori probabilities. If an urn is given to us and we know only that 
it contains white and black balls, it is evident that no means are available 
to estimate a priori probabilities of various hypotheses as to the propor¬ 
tion of white balls. Hence, critics say, a priori probabilities do not exist 
at all, and it is futile to attempt to apply Bayes’ formula to an urn with 
an unknown proportion of bails. At first this objection may appear 

1 One such problem can be found in an excellent book by Thornton C. Fry, ‘‘Prob¬ 
ability and Its Engineering Uses,** New York, 1928. 
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very convincing, but its force is somewhat lessened by considering the 
peculiar mode of existence of mathematical objects. 

Some property of integers, unknown to me, is not present in my 
mind, but it is hardly permissible to say that it does not exist; for it does 
exist in the minds of those who discover this property and know how to 
prove it. 

Similarly, our urn might have been filled by some person, or selected 
from among urns with known contents. To this person the a priori 
probabilities of various proportions of white and bla(^k balls might 
have been known. To us they are unknown, but this should not prevent 
us from attributing to them some potential mode of existence at least as 
a sort of belief. 

To admit a belief in the existence of certain unknown numbers is 
conunon to all sciences where mathematical analysis is applied to the 
world of reality. If we are allowed to introduce the element of belief 
into such exact’’ sciences as astronomy and physics, it would be only 
fair to admit it in practical applications of probability. 

The second and very serious objection is directed against the use of 
formula (6), and for similar reasons against formula (8). Imagine, 
again, that we are provided with an urn* containing an enormous number 
of white and black balls in completely unknown proportion. Our aim 
is to find the probability that the proportion of white balls to the total 
number of balls is contained between two given limits. To that end, we 
make a long series of trials as described in Prob. 5 and find that actually 
in 71 trials, white balls appeared m times. The probability we seek would 
result from Bayes’ formula, provided numerical values of a priori proba¬ 
bilities, assumed on belief to be exi.stent, were known. Lacking such 
knowledge, an arbitrary assumption is made, namely, that all the a 
priori probabilities have the same value. Then, on account of the 
enormous number of balls in our urn, formula (6) can be used as an 
approximate expression of P. It can be shown that, given an arbitrary 
positive number e, however small, the probability of the inequalities 


m 

n 


e < p < 


m 

n 


+ 6 


can be made as near to 1 as we please by taking the number of trials 
greater than a certain number N{e) depending upon € alone. In other 
words, with practical certainty we can expect the proportion of white 
balls to the total number of balls in our urn to be contained within 
arbitrarily narrow limits 


m , 
-he. 


n 


-€ and 

n 
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A conclusion like this would certainly be of the greatest importance. 
But it is vitiated by the arbitrary assumption made at the beginning. 
The same is true of formula (8) and of Laplace^s ^Taw of succession.^' 
The objection against using formulas (6) and (8) in circumstances where 
we are not entitled to use them appears to us as irrefutable, and the 
numerical applications made by Laplace and others cannot inspire much 
confidence. 

As an. example of the extremes to which the illegitimate use of formulas 
(6) and (8) may lead, we quote from Laplace: 

En faisant, par exemple, remonter la plus aiiciennc dpoque de Thistoire k 
cinq mille ans, ou k 1,826,213 jours, et le Soleil s’^tant lev6 constarnment, dans 
cet intcrvalle, chaque revolution de vingt-quatre heures, ily a 1,826,214 k parier 
contre iin qu’il sc Icvera encore demain. 

It appears strange that as great a man as La})lac.e could make such a 
statement in earnest. However, under proper conditions, it would 
not be so objectionable. If, from the enormous number A + 1 of 
urns containing c'ach N black and white balls in all possible proportions, 
one urn is taken and 1,826,213 balls are drawn and returned, and they 
all turn out to Ix:^ white, then nobody can deny that there are very nearly 
1,826,214 chances against^xie that the next ball will also be white. 

Problems for Solution 

1. Throe urns of t he same appearanoe have the following proportions of white and 
black balls: 

Urn 1: 1 white, 2 black balls 
Urn 2: 2 white, 1 black ball 
Urn 3: 2 white, 2 black balls 

One of the urns is s(4cctccl and oiu) ball is drawn. It turns out to be white. What 
is the probability that the third urn was chosen? Ans. 

2. Under th(i same conditions, what is the probability of drawing a white ball 

again, the first one not having been returned? Ans. 4^. 

3. An urn containing o balls has been filled up by taking 5 balls from another um, 

which originally had 5 white and 5 black balls. A ball is taken from the first urn, and 
it happens to be black. Wliat is the probability of drawing a white ball from among 
the remaining 4? Ans. 

4. From an urn containing 5 white and 5 black balls, 5 balls are transferred into an 

empty second urn. From there, 3 balls are transferred into an empty third urn and, 
finally, one ball is drawn from the latter. It turns out to be white. What is the 
probability that all 5 balls transferred from the first urn are white? Ans. 26- 

6. Conditions and notations b«ng the same as in Prob. 3 (page 66), show that the 
probability for an event to occur ™ the (n + l)st trial, granted that it has occurred 
in all the preceding n trials, is ne||T less than the probability for the same event to 
occur in the nth trial, granting that it has occurred in the preceding n ~ 1 trials. 

Hint: it must be proved that 

k k / ^ V 

Xc‘>pr' 
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For that purpose, use Cauchy’s inequality 

( k k k 

1 = 1 / 1 = 1 1 = 1 

6 . Assuming that the unknown probability p of an event E can have any value 
between 0 and 1 and that the a priori probability of its being contained in the interval 
(a, is equal to the length of this interval, prove the following theorem: The prob¬ 
ability a posteriori of the inequality 

V S 


after E has occurred m times in n trials is equal to the probability of at least m. -|- 1 
successes in n -f- 1 ind(‘pendent trials with constant probability <r. (Sec Prob. 13, 
page 59.) 

7. Assumptions being the same as in the preceding problem, find approximately 
the probability a posteriori of the inequalities 

^ p ^ yg, 

it being known that in 200 trials an event with the probability p has occurred 105 
times. A71H. Using the preceding problem and applying Markoff’s method, we find 
P = 0.846. 

8. An urn contains N white and black balls in unknown proportion. The numbei 
of white balls hypothetically may be 

0, 1, 2, ... AT 


and all these hypotheses are (jonsidered as equally likely. Altogether n balls are 
taken from the urn, m of w’hich turned out to be white. Without returning these 
balls, a new group of n\ balls is taken, and it is rcquirc'd to find the probability that 
among them there are mi white balls. Naturally, the total number of balls is so 
large as to have n ni < N. Ans. The required probability has the same expression 



as in Prob. 4, page 69. 

Polynomials ordinarily called “Hermite’s polynomials,” although they were dis¬ 
covered by Laplace, arc defined by 


Hn{y) 



The first four of them are 

Hi{y) = ~tj; Hoiy) = - 1; H^iy) = -y^ + 3?;; 

They possess the remarkable property of orthogonality: 



e ^IJ„(y)H.(y)dy =0 


when 


Hiiy) = - 62/2 + 3. 


rn ^ n 



yl ___ 

^Hn(.yydy = V 2 im!. 


while 
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Under very general conditions, a function/(?/) defined in the interval ( — <», + oo) 
can be represented by a series 


f{y) = Oo + aJIiiy) + aill^iy) + 


where in general 


Lfd, 


1 

- - 7 - V(J 

kW2nJ. ^ 


'f{y)Hkiy)dy. 


id 


/,2 ^ 


7 ! «(1 — a ) 

provided 0 < « < 1. 

9. Prove the validity of th(^ following expansion indicated by Ch. Jordan: 


[n^±21[ 

7n\(n — 771.) \ 






1 - 2a 

1 - --hlhiy) + 

7^+2 

2n - (lln +6)a(l - ^ , 

hHIiiy) + 


2n{n + 2)(n +3) 

for 0 ^ X ^ 1 wher(; y is a new variabh' connected to x by the O(|iiation 


^ ■] 


. y 

J* = « + 7' 

h 

Hint: Ck)nsider the development in a series of Heriiiite’s polynomials of the 
function 

}{y) --= 4- ^1 - <x - for -ha. ^ y 


fiy) — d if either y < —ha 


y S h{l — a) 

y > hO - a). 


10. Assuming that the conditions of validity of formula (G) are fulfilled, show that 
the a posteriori probability of the inequalities 


^4 


(1 — «) 


<p <~ +t 

n 


lad — a) 

\ n 


7n 

n 


can be expanded into a convergent scries 


P = 






le ^ 2ri — (IIti + 6)a(l — a) 

■\/2Tr + 2)(71 + 3)a(l — a) 


+ 


When n is large and a is not near either to 0 nor to 1, two terms of this s(irif?s suffice 
to give a good approximation to P (Ch. Jordan). Apply this to Prob. 7. 

Ans. 0.84585. 
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CHAPTER V 


USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 

OF PROBABILITY 

1. The combined use of the theorems of total and c*om})oimd proba¬ 
bility very often leads to an equation in finite differences which, tog(‘ther 
with the initial conditions suppli(‘d by a j)robleni itself, serves to deter¬ 
mine an unknown probability. This method of attack is very powerful, 
and it is often resorted to, especially in the more difficult cases. In this 
chapter the use of equations in finite differences, applied to a few sehieded 
and comparatively easy examples, will be shown; but in Chap. VIII 
we shall apply the method to a class of interesting and historically 
important problems. 

Certain preliminary explanations are necessary at this point. Again 
we consider a series of trials resulting in an event E or its opposite, F, 
but this time we suppose that the trials are dependent, so that the 
probability of F at a certain trial may vary according to the available 
information concerning the results of some of the other trials. 

A simple and interesting case of dependent trials arises if we suppose 
that the probability of E in the (n + l)st trial receives a definite value 
a if F has happened in the preceding nth trial, and this value does not 
change whatever further information we may possess concerning the 
results of trials preceding the nth. Also, the probability of F in the 
(n + l)st trial receives another determined value /? if F failed in 
the nth trial, no matter what happened in the trials preceding the nth. 

We have a simple illustration of this kind of dependence, if we suppose 
that drawings are made from an urn containing black and white balls in 
a known proportion, and that each ball drawn is returned to the urn, but 
only after the next drawing has been made. It is obvious that the proba¬ 
bility that the (n + l)st ball drawn will be white, becomes perfectly 
definite if we know what was the color of the ball immediately preceding, 
and it remains the same no matter what we know about the colors of the 
1, 2, ... (/I — l)st balls. 

If the trials depend on each other in the above-defined manner, we 
say that they constitute a ‘^simple chain,to use the terminology of the 
late A. A. Markoff, who was the first to make a profound study of 
dependent trials of this and similar, but more complicated, types. It is 
implied in the definition of a simple chain that it breaks into two sepa- 
^rate parts as soon as the result of a certain trial becomes known. For 
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instance, if the result of the fifth trial is known, trials 6, 7, 8, . . . become 
independent of trials 1, 2, 3, 4, and the chain breaks into two distinct 
parts: the trials preceding the fifth, and those following it. If the 
results of trials 1, 2, 3, . . . (n — 1) remain unknown, the event E 
in the following nth trial has a certain probability which we shall denote 
by pn. Also, if it be(‘omes known that E happened at trial k, where 
k < n ~‘l, the probability of E happening in the ?ith trial receives a 
different value, It is important to find means to determine the 

probability pn, the a priori prol)ability of E in the nth trial when the 
results of the pre(*(Hiing trials remain unknown; as well as to determine 
the probability of E in the nth trial when we possess the positive 
information that E has materialized in the kth(k < n ~ 1) trial. 

2. Thus we are led t(j the following problem concerning simple chains 
of dependent trials: 

Problem 1. The initial probability pi of the event E in a simple 
chain of trials being known, find the probability pn of E in the nth trial 
when the results of the preceding trials remain completely unknown. 
Also, find the pro])ability p^Jf of E in the nth trial when it is known that 
E has happened in the kth trial where A: < n — 1. 

Solution. In the )ii\i trial the event E can happen either proceeded 
by PJ in the (n — l)st trial, the lU’obability of which is p^-i, or preceded 
by F in the (n — l)st trial, the })robability of which is 1 — p„_i. By 
the theorem of compound probal)ility, the iH’obability of the succession 
EE is Pn-\c^, while the probability of the succession FE is (1 — pn-~\)0- 
Hence, the total probability pr, is 

(1) Pn = apn-l + /:^(1 - Pn-\) = (« “ ^)Pn-l + d- 

This is an ordinary equation in finite differences. It has a particular 
solution 


pr, = c =-- const. 


where c is determined by the equation 


whence 


C = (a “ 0)C + 
8 

l + 0-a 


provided 1 + jS — a 0.' On the other hand, the corresponding 

^Tf l+jS—a = 0 or a — ^ — Ij we necessarily have a = 1, = 0, which 

means that E must occur in all the trials if it actually occurs in the first trial, and 
never occurs if it does not actually occur at the outset. This case, as well as the other 
extreme case in which a — /3 = —1 can therefore be excluded as not possessing real 
interest. 
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homogenooiis equation 


y„ = (a - 

has a general solution 

2/„ = C{a — 

involving an arbitrary constant C. Adding to it the previously found 
particular solution, we obtain the general solution of (1) in the form 


= C{a - + 


1 + ^ 


The arbitrary constant C is determined by the initial condition 


so that finally 
Pn ^ 

If 








1 + /3 - 


+ iPi - 


Pi 


1 + ^ - a, 

/J 


^(a 




1 + 13 — a 


we see that Pn does not depend on n and is constantly equal to pi. Be¬ 
cause we may exclude the cases a — ^ = 1 or a — p = —1, so that 
a — is contained between —1 and 1, we may conclude from the above 
expression that pn, if not a constant, at any rate tends to the limit 


i+P - a 

as n increases indefinitely. 

As to p^n^ we find in a similar way that it satisfies the equation 
(2) pl^'^ = apjfli + 13(1 - pl^h) 

of the same form as equation (1). But the initial condition in this 
case is = a because the probability of E happening in the {k + l)st 
trial is a when it is known that E occurred in the preceding trial. The 
solution of (2) satisfying this initial condition is 


pa) 




+ 


1 - 


1 + ^ - a ' 1 + - a 


(«“ 


As the second term in the right-hand member decreases with increas¬ 
ing n and finally becomes less than any given number, we see that the 
positive information concerning the result of the fcth trial has less and less 




Sec. 3] USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 77 


influence on the probability of E in the following trials, and in remote 
trials this influence becomes quite insignificant. 


Example. An urn contains a white and h black balls, and a series of drawings of 
one Vmll at a time is made, the ball removed being returned to th(* urn immediately 
after the taking of the next following ball. What is the probability that the nth ball 
drawn is white when: (a) nothing is known about the preceding drawings; (fe) the /cth 
ball drawn is white? 

In this particular problem we have « — —-j d = -- 1 », = ——— 

o b — 1 (1 b — 1 CL -V b 

and 


Thus 


0 


I (3 — a a b 


= pi. 


Pn = Pi 


a 

a + b 


That is, the probability for any ball drawn to be white is the same as that for the 
first ball, nothing being known about the results of the previous drawings. The 
expression for is, in this example, 






a b (a + /))(a + 6 — 1 

So, for instance, if a = 1 , 6 = 2 , n = 5, /c = 3, 


1 2 
3 3 ~ 22 


1 

2' 


the information that the third ball was white raises to the probability that the fifth 
ball will be white; it would be I 3 without sin^h information. 


3. The next problem chosen to illustrate the use of difference equa¬ 
tions is interesting in several respects. It was first propounded and 
solved by de Moivre. 

Problem 2. In a series of independent trials, an event E has the 
constant probability p. If, in this series, E occurs at least r times in 
succession, we say that there is a run of r successes. What is the proba¬ 
bility of having a run of r successes in n trials, where naturally n > r? 

Solution. Let us denote by ijn the unknown probability of a run of 
r in n trials. In n + 1 trials the probability of a run of r will then be 
2/n+i. Now, a run of r in n + 1 trials can happen in two mutually 
exclusive ways: first, if there is a run of r in the first n trials, and second, 
if such a run can be obtained only in n + \ trials. The probability of 
the first hypothesis is i/n- To find the probability of the second hypothe¬ 
sis, we observe that it requires the simultaneous realization of the follow¬ 
ing conditions: 

(a) There is no run of r in the first n — r trials, the probability of 
which is 1 ~ 2/n-r. Q>) In the (n — r + l)st trial, E does not occur. 
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the probability of which is g = 1 — p. (c) Finally, E occurs in the 
remaining r trials, the probability of which is 

As (a), (6), (c) are independent events, their simultaneous mate¬ 
rialization has the probability 

(1 - 

At the same time, this is the probability of the second hypothesis. 
Adding it to ^/n, we must obtain the total probability 2/n+i. Thus 

(3) 2/n-fl = 2/n + (1 - yn-r)p^q 

and this is an ordinary linear difference equation of the order r + 1. 
Together with the obvious initial conditions 

?/o = 2/1 == • • * = Vr-i = 0, 2/r = P" 

it serves to determine yn completely for n — r + 1, r + 2, . . . . For 
instance, taking n = we derive from (3) 

2/r+l = P" + P"^. 

Again, taking n = r + 1, we obtain 

yr+2 = P^ + 2p^q 

and so forth. Although, proceeding thus, step by step, we can find the 
required probability t/n for any given n, this method becomes very labori¬ 
ous for large n and does not supply us with information as to the behavior 
of 2/n for large n. It is preferable, therefore, to apply known methods of 
solution to equation (3). First we can obtain a homogeneous equation 
by introducing 2« = 1 — ?/« instead of The resulting equation in 
2n is 

( 4 ) Zn+l - Zn + qp^Zn-r == 0 
and the corresponding initial conditions are: 

Zo = 2l = * • * = Zr-l =1; Zr = 1 — P^ 

We could use the method of particular solutions as in the preceding 
problem, but it is more convenient to use the method of generating 
functions. The power series in ^ 

= 2o + Zif + Z2?2 + . • . 

is the so-called generating function of the sequence Zo, Zi, Z 2 , . . . . 
If we succeed in finding its sum as a definite function of the development 
of this function into power series will have precisely Zn as the coefficient 
of To obtain ip{^) let us multiply both members of the preceding 
series by the polynomial 

1 - { + 
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The multiplication performed, we have 


(1 - f + qpT+^M^) = Zo + {Zl - Zo)f + • • • + (Zr-l - Zr-2)r-‘ + 

+ (Zr - 2r-l)r + (Zr+1 “ Zr + + • • • . 

In the right-hand member the terms involving . . . have 

vanishing coefficients by virtue of equation (4); also Zk — Zk-i = 0 for 
= 1, 2, 3, . . . r — 1, while 


so that 
and 


20 = 1 and Zr — 2r-i = “-p*” 

a - i + qpT^^)<Pa) = 1 - VT 


<pU) = 


1 - rr _ 

1 “ f 


The generating function <p(^) thus is a rational function and can be 
developed into a power series of f according to the known rules. The 
coefficient of gives the general expression for Zn. Without any dif¬ 
ficulty, we find the following expression for Zni 


(fi) ^n.r p^^n—r,r 

where 

n 

/=0 

and /3n-r,r is obtained by substituting n — r instead of ri. If n is not very 
large compared with r, formula (5) can be used to compute Zn and 


Vn 1 

For instance, if n = 20, r = 5, and p = q = we easily find 

1/^1 _ 4- 

1 04 642 048 32^^ 04 + 042^ 

and hence 

220 = 0.75013 


correct to five decimals; 1/20 = 0.24987 is the probability of a run of 5 
heads in 20 tossings of a coin. 

4. But if n is large in comparison with r, formula (5) would require 
so much labor that it is preferable to seek for an approximate expression 
for 2n which will be useful for large values of n. It often happens, and 
in many branches of mathematics, but especially so in the theory of 
probability, that exact solutions of problems in certain cases are not of 
any use. That raises the question of how to supplant them by con- 
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venient approximate formulas that readily yield the required numbers. 
Therefore, it is an important problem to find approximate formulas where 
exsict ones cease to work. Owing to the general importance of approxi¬ 
mations, it will not be out of order to enter into a somewhat long and 
complicated investigation to obtain a workable approximate solution 
of our problem in the interesting case of a large n. 

Sinc.’o is a rational function, the natural way to get an appropriate 
expression of 2 :,^ would be to r(\solve v’(^) into simple fractions, correspond¬ 
ing to various roots of the denominator, and exi)and those fractions in 
power series of Howcwer, to attain definite conclusions following this 
method, we must first seek information concerning roots of the equation 

1 “ ^ ^ =■- 0. 

6 . Let 

m = t _ 1 - 1 

where 

a = p’'(l - p). 


When p varies from 0 to 1, the maximum of p’'(l — p) is attained for 
p = ^ u/(r + 1)''+^ in all cases. 

To deal with the most interesting case, we shall assume 


(6) P < 

which involves 

“ < (7+"iy+i 

and we leave it to the reader to discover how the following discussion 

T 

should be modified if p ^ —r-:: * 

r + 1 

When ^ starts to increase from 0, the function/(^) st(^adily increases 
and attains a positive maximum for ^ where 

(r + = 1 

after which /(^) decreases steadily to negative infinity. Hence, there 
are two positive roots of the equation /(f) =0: fi, which is less than 
T “4“ 1 

———; and another root greater than this number. This root is 1/p if 
condition (6) is fulfilled. 

The remaining roots are all imaginary if r is odd and there is one 
negative root among them if r is even. 

Now we shall prove that the absolute value of every imaginary or 
negative root is >l/p. Let p be the absolute value of any such root. 
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We have first 


/(P) 


1 - < 0 


so that P belongs either to the interval (0, Ji) or to the interval (1/p, + 00 ), 
and if we can show tliat p > Jo then p can he only >\/}). If the root we 
consider is ruigative, p satisfies the ee(iiation 


y{p) 


ap 


and since F{p) increases till a positive maximum for p = Jo is reached, and 
then decreases, the root of F(p) = 0 is necessarily > Jo. If J = pc'® is 
an imaginary root of /(J) = 0 we hav(‘, equating imaginary parts, 


(7) 


ap 


^ + 1 )^ 


sin e 


1 . 


But, whatever 6 may be 


sin (r + 1)P| 
sin 0 


^ r + I 


the equality sign being excluded if sin 0 0.^ Hence, 

(r + l)ap'' > 1 

which implies p > Jo. The stabanent is thus completely proved. 
6 . The equation 


J - 1 

can be exhibited in the form 

1 




+ a^ = 1 . 

Substituting J = pc’® here, and again ecpuiting imaginary parts, we get 

Q:p’'+^ sin 7'd — sin 6 
and, combining this with (7), 

_ sin (r + 1)^^ _ (sin rOy sin d 


sin rS 


[sin (r + 1)0]''+^ 


. sm -ntO 

^ The extreme values of the ratio —;-(?/i integer > 1) correspond to certain 

sin 9 

roots of the equation m sin 9 cos 7n9 = sin m9 cos 9, but for every root of this equation 


sin m9 
sin 9 


■y/l + - 1) sin 2 e 

The equality sign is excluded if sin 9 differs from 0. 


~ ^ m 
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If the imaginary part of f is positive, the argument 6 is contained 

TT 

between 0 and w. In this ease, it cannot be less than - ^ or greater 


than TT-For, if 0 < ^ < ^ 


r + 1 


or 


At the same time 


r + 1 
sin t6 ^ sin (r + 1)6 
rd ~ ^ 'TrT'i ) 6> ■ 


sin rd ^ r 


sin (r + 1)^ r + 1 


sin 6 


sin (r + 1)0 r + 1 


and hence 


_ ( sin rd V si 
(sin (r +1)0/ sin (r 


sin 0 ^ r’ 


+ 1)0 (r + 1)^+1 


which is impossible. That 0 cannot be greater than tt — ^ follows 

simply, because in this case, sin (r + 1)0 and sin rd would be of opposite 
signs and p would be negative. 


As —~ < 0 < TT-we have 

r+l~ “ r+1 

p sin 0 > p sin — 

r + 1 

On the other hand, sin x > 2 x/t if 0 < a: < 7r/2 and p > lip. Hence, 

2 

p sin 0 > 7 — 

(r + l)p 

Thus, imaginary parts of all complex roots have the same lower bound 


2 


of their absolute values. 
7. Denoting the roots 


(r + l)p 

of the equation /(f) = 0 by 
(fc = 1, 2, . . . r + 1) 


r + 1 

.(i) - 2 


_ 1 - pf* { , 

(1 - p)ik(r -hi - r^ic)\ 



we have 



Sec. 7] USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 83 


Hence, expanding each term into power series of { and collecting 
coefficients of f'", we find 


r + l 


= 2 ri— 


vh 




(1 - r + 1 - r^k 


For every imaginary root, wo have 

(1 - P^k) ^ir-‘ _ 

(1 - p)ik{r + i - rik) 


r + I 
f(l - p) 


p" 


since 


l&'l < p; 




< 2p; 


|r + 1 — rfitl 


< 


(r + l)p 
2r 


If r is odd, there are r — 1 imaginary roots and the part in the expression 
of Zn due to them in absolute value is less than 


(r + l)(r - 1) 


pn+2 ^ 




r(l — p) ^ 1 — 

The term corresponding to the root 1/p vanishes, so that finally 


= 






+ 


-V 


n+2 


(1 - p)fi r + 1 - r?i ■ 1 ~ p^ 

where |0| < 1 and denotes the least positive root of the equation 

1 - ? + = 0 . 

If r is even, there is one negative root. The part of Zn corresponding 
to this root is less than 


2 p ”+2 

(1 -^r 

The whole contribution due to imaginary and negative roots is less than 


r- — r 

r(l - p) 




< 


1 - p^ 




in absolute value. Thus, no matter whether r is odd or even, we have 


(8) 2n 


1 — P ^l 

(1 - P)il 


er" 

r + 1 — 





-1 < ^ < 1. 


This is the required expression for Zn, excellently adapted to the case of a 
large value for n, since then the remainder term involving S is completely 
negligible in comparison with the first principal term. 
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The root can be found either by direct solution of the trinomial 
equation following Gauss’ method, or by application of Lagrange’s series. 
Applying Lagrange’s series, we have 


log 

both series being convergent if |a| < ‘U/(r + and this condition is 

satisfied. 

8 . Let us apply the api)roximate formula (8) to the case p = 7 ^ 2 

and r = 10. Using Lagrange’s series, we find that 

= 1.0004909 

and 

= 1.003947 • (1.0004909)-" + 

Hence, for n = 100, 1,000, 10,000, respectively, 

2, - 0.9559; 0.6146; 0.0074 

so that, for instance, the probalulities of a run of at least 10 heads in 
100, 1,000, or 10,000 throws of a coin are, resj)ectively, 

0.0441; 0.3854; 0.9926. 

Thus, in 10,000 throws, it is quite likely that heads would turn up 10 or 
more times in succession. 

In general, for a given r and increasing r/, the probability 2/„ tends to 1, 
so that in a very long series of trials, runs of any length are extremely 
likely to occur, a conclusion which at first sight seems paradoxical. 

9. In the preceding (examples, an unknown probability was deter¬ 
mined by an ordinary equation in finite differeiuies. Very often, how¬ 
ever, probability as a function of two or more independent variables is 
defined by a partial diffenmce equation in two or more independent 
variables, together with a set of initial conditions suggested by the 
problem itself. A few examples will suffice to illustrate the use of 
partial equations in finite differences and to give an idea of the two 
principal methods for their solution; namely, Laplace’s method of 
generating functions, and the less well known, but elegant, method 
proposed by Lagrange. 

We start with an analytical solution of the problem which was dis¬ 
cussed in detail in Chap. III. 


1 . , + + 3) • • • {Ir + D , 

l+a + 2i - - U - “ 

. + 2) ■ ■ ■ (Ir + I ~ 1) , 

a+2j^ - - 
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Problem 3. Find the probability of exactly x successes in t inde¬ 
pendent trials with the constant probability p. 

Solution by Laplace’s Method. Let us denote the required proba¬ 
bility by 2/x.f. To obtain x succ^esses in i trials can be possible only in 
two mutually exclusive ways: (a) by obtaining x successes in / *- 1 trials 
and a failure at the last trial; (5) by obtaining success at the last trial 
and X — \ successes in the preceding ^ — 1 trials. The probability of 
case (a) is qyx,i-i and that of case (6) is The total probability 

yx,t satisfies the equation 

(9) yx,t = vvx-u-i + qyx,t-\ 

for all positive x and t. This (Hiuation alone does not determine yx,t 
completely, but it does so in connection with certain initial conditions. 
These conditions are 

yx,o = 0 if :r > 0, 

( 10 ) 

?/0.< = if < ^ 0. 

The first set of equations is obvious; the second set is the expression 
of the fact that if there are no successes in t trials, the failures occur t 
times in succession, and the probability for that is qK 

Following Laplace, we introduce for a given t the generating function 
of 2 / 0,0 yi,t] y2,u * • • , that is, the power series 

00 

- yo,t + yiA + + * • • = ^yx,t^^‘ 

Taking ^ — 1 instead of tj separating the first term and multiplying by 
we have 

00 

qiPt-liO = qyo,t-i + ^qyx,t-ii'^\ 

X = ] 

and similarly 

eo 

X = 1 

Adding and noting equation (9) we obtain 

+ q)^t-\{i) == + qyo,t-i — 2 / 0,0 

but because of (10) 

qyo.t-i - yo,t = g' - 9^ = o 


and hence, 


MO = 
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for every positive t. Taking ^ = 1, 2, 3, . . . and performing successive 
substitutions, we get 

<Pt{i) = (pf + ^)Vo(f) 

and it remains only to find 

= 2/0,0 + 2/i.of + 2/2,of^ + * * * . 

But on account of (10), yx.o = 0 for a: > 0, while 2 / 0,0 = 1. Thus, 


and 


M = 1 


= (pf + g)'. 


To find yxj it remains to develop the right-hand member in a power series 
of f and to find the coefficient of The binomial theorem readily gives 


yx,t = 


tit - 1) 


it 


+1) 


1 -2 


pV 


10 . Poisson’s Series of Trials. The analytical method thus enables 
us to find the same expression for probabilities in a Bernoullian series 
of trials as that obtained in Chap. Ill by elementary means. Considering 
how simple it is to arrive at this expression, it may appear that a new 
deduction of a known result is not a great gain. But one must bear in 
mind that a little modification of the problem may bring new difficulties 
which may be more easily overcome by the new method than by a general¬ 
ization of the old one. Poisson substituted for the Bernoullian series 
another series of independent trials with probability varying from 
trial to trial, so that in trials 1, 2, 3, 4 , . . , the same event £ has different 
probabilities pi, p 2 , Vh P 4 , . . . and correspondingly, the opposite event 
has probabilities qi, ^ 2 , (? 3 , ^ 4 , . . . where qk = \ — Pkj in general. Now, 
for the Poisson series, the same question may be asked: what is the 
probability yx,t of obtaining x successes in t trials? The solution of this 
generalized problem is easier and more elegant if we make use of differ¬ 
ence equations. 

First, in the same manner as before, we can establish the equation in 
finite differences 

( 11 ) yx,t = ptyx-i.t~i + qtyz,t^i- 

The corresponding set of initial conditions is 

yx,o = 0 if a; > 0 

(12) yQ,t = qm ' ' ' qi if ^ > 0 

2 / 0.0 = 1 . 

Giving the same meaning as above, we have 
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a: == 1 

CO 


x^\ 

whence 

iVti + — *Pt{0 + qtyo.t-i “■ yo,t} 

but because of (12) 

qtyo,t-i — yo,t = Qiq 2 ^ ' qt — qm * * * = 0, 

and thus 

<pX^) = (ptf + qt)(pt~i(0 

whence again 

<ft{0 — (PiJ + qi)ip2^ + q2) ’ ' ' iVt^ + qt)(po(0‘ 

However, by virtue of (12), <pq(^) = 1 so that finally 

= (Pif + qi)(P2i + ^ 2 ) * • • {pi^ + qi)- 

To find the probability of x successes in t trials in Poisson^s case, one 
needs only to develop the product 

(piS + ^i)(p2$ + ^2) * * * {pti + qt) 


according to ascending powers of f and to find the coefficient of p. 

11. Solution by Lagrange’s Method. We shall now apply to equa¬ 
tion (9) the ingenious method devised by Lagrange, with a slight modifica¬ 
tion intended to bring into full light the fundamental idea underlying this 
method. Equation (9) possesses particular solutions of the form 




if a and 0 are connected by the equation 

= p + qa. 

Solving this equation for /?, we find infinitely many particular solutions 

a®(g + pa~^y 

where a is absolutely arbitrary. Multiplying this expression by an 
arbitrary function ip(a) and integrating between arbitrary limits, we 
obtain other solutions of equation (9). Now the question arises of how 
to choose <p{a.) and the path of integration to satisfy not only equation (9) 
but also initial conditions (10). We shall assume that (p{a) is a regular 
function of a complex variable a in a ring between two concentric circles, 
with their center at the origin, and that it can therefore be represented in 
this ring by Laurent’s series 
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If c is a circle concentric with the regularity ring of ip{a) and situated 
inside it, the integral 





^{q 4 " pa~’^yip(a)da 


is perfectly determined and represents a solution of (9). To satisfy 
the initial conditions, we have first the set of equations 



<p{a)doL — 0 


for 


X - 1, 2, 3, . . . 


which show that all the coefficients with negative subscripts vanish, 
and that (p{a) is regular about the origin. The second set of equations 
obtained by setting a: = 0 

for f = 0, 1, 2, . . . 

serves to determine <p(a). If e is a sufficiently small complex parameter, 
this set of equations is entirely equivalent to a single equation: 

1 j <p(a)da _ 1 

2WJc a — €(p + qa) 1 — eq 

Now the integrand within the circle c has a single pole ao determined by 
the equation 

ao = c(p + qao) 


and the corresponding residue is 


<p(<xo) 

1 - qe 


At the same time, this is the value of the left-hand member of the above 
equation, so that 

<p(oio) _ 1 

1 — qe 1 — qe 


or 


<p(ao) = 1 


for all sufficiently small e or ao. That is, <p(a) — 1 and 

+ a) 

is the required solution. It remains to find the residue of the integrand; 
that is, the coefficient of 1/a in the development of 
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in scries of ascending powers of a. That can be easily done, using the 
binomial development, and we obtain 

yx,t == Cfp V 

as it should be. 

12. Problem 4. Two players, A and B, agree to play a series of 
games on the condition that A wins the series if he succeeds in winning a 
games before B wins h games. The probability of winning a single game 
is p for A and q = I — p for B, so that each game must be won by either 
A or B. What is the probability that A will win the series? 

Solution. This historically important problem was proposed as an 
(‘X(^rcise (Prob. 12, page 58) with a brief indication of its solution based 
on elementary principles. To solve it analytically, let us denote by 
the probability that A will win when x games remain for him to win, 
while his adversary B has t games left to win. Considering the result 
of the game immediately following, we distinguish two alternatives: 
(a) A wins the next game (probability p) and has to win x — 1 games 
before B wins t gam(‘s (probability (^) ^ the next game 

(probability q) and has to win x games before B can win ^ — 1 games 
(probability Tlie ju'obabilities of these two alternatives being 

pyx-\,t and qyx,t-\ their sum is the total probability Thus, yx,t 

satisfies the e(p.iation 

(13) yrj. = pyx-\,t + qyx,t-\- 

Now, yx,o — 0 for j > 0, which means that A cannot win, B having 
won all his games. Also, yo,t — 1 for ^ > 0, which means that A surely 
wins when he has no more games to win. The initial conditions in our 
problem are, therefore, 

yx,o = 0 if a- > 0; 

(14) 

= 1 if t > 0, 

The symbol yo,o has no meaning as a probability, and remains undefined. 
For the sake of simplicity we shall assume, however, that t/o.o = 0. 

Application of Laplace’s Method. Again, let 

<px(0 — yx,o + yx,i^ + -I- . . . 

be the generating function of the sequence yx.o] yx,u 2/*, 2 , . . . cor* 
responding to an arbitrary x > 0. We have 

00 

t^l 

00 

P'Px- 1 ( 1 ) = pvx-i.o + '^pyx-'i.t? 
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and 

so 

qipxii) + pvx-lii) = py*-i.o + '^{vvx-u + gyx.t-i)i‘ 

t^l 

or, because of (13), 

+ P<px-l(^) = pyx-1,0 — Vx.O + <Px{^- 

Now, for every a; > 0 

Vx.O ~ y x—\,o ~ 0 

in conformity with the first set of initial conditions, which allows us to 
present the preceding relation as follows: 

whence 

But 


<Po(£) = 2 / 0,0 + y 0 ,l( + 2 / 0 . 2 ^^ + .•• = . .. Y~~^ 


and finally 

= (T^Ki - q^y 

It remains to develop the right-hand member in a power series of f and 
find the coefiicient of f*. As 


1 - i 


= f + 


and 


. 1 + f,{ + S<^Vp + 


(1 - q^Y - * . . y. 2 

we readily get, multiplying these series according to the ordinary rules, 

V - pJl I- I I . . . , a:(x+l) • • • (a; + (-2 ) 1 

yx, P[l + i?+ 1-2 ^ + + 1-2 • • • «-l) M 


which coincides with the elementary solution indicated on page 58. 

Application of Lagrange’s Metiiod. Equation (13) has particular 
solutions of the form 


where 


aff = pl3 + qa. 



8bc. 121 USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 91 


Hence, we can either express a by /S or by a. Leaving it to the reader 
to follow the second alternative, we shall express a as a function of /3 
and seek the required solution in the form 

where ip{0) is again supposed to be developable in Laurent’s series in a 
certain ring; c is a circle described about the origin and entirely within 
that ring. Setting = 0, we must have 

= 1 for < = 1, 2, 3, . . . 
and this set of equations is satisfied if we take 

^ ^ + ■ ■ ■ = - 1 )’ 1^1 ^ 

Now wo have 

_ p* r 

2«J,(1 - - 1) 

and for f = 0 

_ P* f _ n 

2 « J ,(1 - - 1 ) 

as it should be, because for |/3| > 1 the integrand can be developed into a 
power series of 1//^, the term with 1//3 being absent. Thus, the required 
solution is given by 

_ C 

(I - qp-^YiP - 1) 

where c is a circle of radius >1 described about the origin. The final 
expression for yx,t is obtained as the coefficient of 1/jS in the development 


(1 - - 1 ) 

into power series of l/jS. We obtain the same expression as before. 


Problems for Solution 


1. Each of n urns contains a white and b black balls. One ball is transferred from 
the first urn into the second, another one from the second into the third, and so on. 
Finally, a ball is drawn from the nth urn. What is the probability that it is white, 
when it is known that the first ball transferred was white? 


Ana. 


a -p b a -V b 


(a + 6 + 1)^*-. 
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2. Two urns contain, respectively, a white and b black, and b white and a black 
balls. A series of drawings is made, according to the following rules: 

fit. Each time only one ball is drawn and immediately returned to the same urn it 
came from. 

6. If the ball drawn is white, the next drawing is imule from the first urn. 

c. If it is black, the next drawing is made from the second urn. 

d. The first ball drawn comes from the first urn. 

What is the probability that the nth ball drawn will be white? 


A ns, p, 


1 \/a - bV 

2 2\(i + h) 


3. Find the probability of a run of 5 in a series of 15 trials with constant prob¬ 
ability p = }i. Ans. Vi, = 23.3-« - 70 . 3-^2 0.0314184. 

4. How many throws of a coin suflice to give a probability of more tVian 0.999 for 


a run of at least 100 heads? A ns. 1.70 • lO-^^ throws suffice. 

6 . What is the least number of trials assuring a probability of ^^2 ^ 

least 10 successes il p = q — Ans. 1,420. 

6 . Seven urns contain black and white balls in th(‘ following proportions: 


Urns. 

1 

2 

3 

4 

5 

(> 

7 

White. 

1 

2 

2 

3 

2 

3 

4 

Black. 

2 

1 

2 

1 

I 


2 

5 


One ball is drawn from each urn. What, is the probability that there will be among 
them exactly 3 white balls? Ans. C4)efficieiit of in. 


(1^ + l)(i^ + i)(2^ + 2)(4^ + Dds + ?)(s^ + g)(9^ "b E) 

or 

if,! = 0.28025. 


7. Two players, each possessing $2, agree to play a series of games. The prob¬ 
ability of winning a single game is for both, and the loser pays $1 to his adversary 
after each game. Find the probability for each one of them to be ruined at or before 
the nth game? 

Solution. Let y„i be the probability that after playing 27n games, neither of the 
players is ruined. We have 


and hence 


Um + l — Uliim 


The probability for one of the players to be ruined at or before the rd h game is- 

2 


L. 

2^+1 


if w = 27n or 71 = 2 ?n + 1 . 

8 . Solve the same problem if each player enters the game with $3. 

Ans. H if 77 = 27n — 1 or n = 2m. 

9. Players Ai, As, . . . A,i+i play a series of games in the following order: first A i 
plays with A 2 ; the loser is out and the winner plays with the following player. As; the 
loser is out again and the next game is played with A 4 , and so on; the loser always being 
out and his place taken by the next following player. The probability of winning a 
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single game is H for each player and the series is won by the player who succeeds in 
winning over all his adversaries in succession. What is the probability that the 
series will stop exactly at the xth game? What is the probability that the series will 
stop before or at the ath game? 

Solution. Let y.r be the probability that the series terminates exactly at the xth 
game. That means that the player who won the game entered at the (x — n + l)st 
game and won successively the n following games. Now, there are n — 1 cases 
to be disting\iished according as the player beaten at the (x — n- + l)«t game has 
alr(‘ady won 1, 2, 3, . . . n — 1 games. Let, pa be the proV)ability that the loser in the 
(x — n + l)st game pnwiously has won Ic games. The probability of ending the 
seri(‘S in this case is pk/2^. On the other hand, 


so that 


Hence, for x > n 



Eh _ 

2 « “ 2 * ' 




Initial conditions: 


y\ = 2/2 = 


= yn-\ = 0; 


\h 


1 

2n-r 


The generating function of y/^: 


2/1 + 2 / 2 ^ + 2 / 3 ^^ 4 - 



and the generating function of the probability that the series will end before or at the 
xth game is 



10. Three players, A, By C, play a series of games, each game being won by one of 
th(‘,m. If the probabilities for A, By C to win a single game are p, q, r, find the prob¬ 
ability of A winning a games before B and C win h and c games, respectively. 

Solution. Let Ax.y.z denote the probability for A to win the series when he has 
still to win X games, while B and C have to win p and 2 games, respectively. First, 
we can establish the equation 

Ax,y,z ~ 'pAx—l,y,z 4 “ qAx,y—1,z 4 ” ^Ax,y,t~\. 

Next, Ao,v,z = 1 for positive y, z, and Ax.o.z — 0 for positive x, z; Ax.y.o = 0 for posi¬ 
tive X, y. Besides, although this is only a formal simplification, we shall assume 
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-4x.o,f = 0, Ax.y.o — 0 when x or y or z vanishes. For the generating function of 

Ax,y,n 


we find the equation 


<t>xUy v) = ^ Ax.y.t^V" 

y,z=^0 


whence 


The final answer is 


*t>x{^t v) 

4>x(^, v) = 


1 - ~ 


v) 


(1 -q^-rr,)- (1 - ^)(1 - v) 


Aa,b,c — P* 


1+^(9 + '')+ + r)‘ + 


a{a + l)(a + 2) 

1 -2 *3 


(q -f r)3 + 


1 


the dash indicating that powers of q and r with the exponents ^ b and ^ c are omitted. 

Obviously, the same method can be extended to any number of players, and leads 
to a perfectly analogous expression of probability. 

11. An urn contains n balls altogether, and among them a white balls. In a series 
of drawings, each time one ball is drawn, whatever its color may be, it is replaced by 
a white ball. Find the probability y*,r that after r drawings there are x white balls 
in the urn. 

Solution. The required probability satisfies the equation 


n ““ a: + 1 , x 

= P*-l,r H 2/x,f. 

n n 

Besides, 

2/a.o = 1, y*.o == 0 if X ^ a, yx,r =0 if x < a. 

From the preceding equation, combined with the initial conditions, we find suc¬ 
cessively 


2/a.r = 


ya-^l,r — 


ya+2.r = 



{n — a)in — a — 1) [ /a + ^ /o + i V , / a\ 

12 L\ « / \ « / ^ W 


and so on. 

12. If, in the problem of runs, p is supposed to be 


> 


—»prove that the probabil- 
r I 


ity of a run of r in n trials is greater than 

1 _ ( P Pt , ^(P + PiA Pi'^^ 
\r - (r 4- l)pi 2 /1 - Pi 


where pi < 


r + 1 


is a root of the equation 


PI(1 - Pi) = ^”(1 - p). 
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^13. To find an asymptotic expression of probability for a run of r in n independent 

T 

trials, if p ^ -» the following proposition is of importance: Imaginary and nega^ 

r 4- 1 

tive roots of the equation 

(1 — s)x‘^ — z -V s - 0; 0 < 8 ^ —-— 

n — 1 

are, in absolute value, greater than the root 72 > 1 of the equation 
(1 — s)72« — jB + fi cos — = 0. 


Prxpre the truth of this statement. 

14. Given s urns containing the same number n of black and white balls in known 
proportions, drawings are made in the following manner: first, a single ball is drawn 
out of every urn; second, the ball drawn from the first urn is placed into the second; 
that drawn from the second is placed in the third, and so on; finally, the ball drawn 
from the last urn is placed in the first, so that again every urn contains n balls. Sup¬ 
posing that this operation is repeated t times, find the probability of drawing a white 
ball from the a:th urn. 

Solution. Let yx,t be the required probability. First, it can be shown that it 
satisfies the equation 


( A 1 

*./ = I 1- ]yx,t-i d— 

\ n/ n 


The initial probabilities 2 / 1 . 0 , 2 / 2 , 0 , . . • Vs.a are known; and, moreover, the function 
yx,t must satisfy a boundary condition of the periodic type, 3 / 0 ,« = Hence, 

applying Lagrange’s method, the following solution is found 




fix) -h --- —fix — 1) -f - ~ fix —2) + 

1 • (n — 1) 1 '2(n — 1)* 


fix) = 2/x.o when x > 0 
and the definition is extended to x ^ 0 by setting 

fi-x) = fis - x). 

If, to begin with, all urns contain the same number of white and black balls, so that 
fix) =: const. = p, we shall have, no matter what t is, 
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CHAPTER VI 


BERNOULLI’S THEOREM 

1. This fluiptcT will bo dovotod to one of the most important and 
beautiful theorems in the theory of probability, discovered by Jacob 
Bernoulli and j)ublisfied with a proof remarkably rigorous (save for some 
irrelevant limitations assumed in the j^roof) in his admirable posthumous 
book Ars conjeetandi” (1718). This book is the first attempt at scien¬ 
tific exposition of the theory of probability as a separate branch of 
mathematical scien(*e. 

If, in n trials, an event E occurs m times, the number m is called the 
^frequencyof E in n trials, and the ratio m/71 receivers the name of 
^‘relative frequency.Bernoulli\s theorem reveals an important proba¬ 
bility relation between the relative frequency of E and its probability p. 

Ber noulli’s Th ej:^ijsm. With the probability approaching 1 or certainty 
as near as we please, we may expect that the relative frequency of an event E 
in a series of independent trials with co7ista7it probability p will differ from 
that probability by less than any given number e > 0 , provided the number 
of trials is taken sufficiently large. 

In other words, given two positives numbers e and ??, the probability 
P of the inequality 


will be greater than 1 — 17 if the .number of trials is above a certain 
limit depending upon e and rj. 

Proof. Several proofs of this important theorem are known which 
are shorter and simpler but less natural than Bernoulli\s original proof. 
It is his remarkable proof that we shall reproduce here in modernized 
form. 

a. Denoting by Tm, as usual, the probability of m successes in n trials, 
we shall show first that 

/1 \ Th-\-k ^ Ta-\-k 

~f7 

if 6 > a and k > 0. Since the ratio 

Tx+i ^ n - X p 
Tx X + I q 

96 
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decreases as x increases we have for h > a 


Tb+i ^ Ta-\-\ _ Th-{-\ ^ Th 

""m ^ rp rp ^ 'V ' 

J- h a J 0+1 -/ a 

Changing 5, a, respectively, into 6 + l,a+l;i[> + 2, a + 2; • • • b + kj 
a k, it follows from the last inequality that 


that is, 


Tb+k 

T’o+fc 


T a-Jt-k- 


Tb+-i 

f — 


n 

Ta 




h. Integers X and /jl being determined by the inequalities 
X — 1 < np ^ X, fi — 1 < np rie ^ jjL 
the probabilities A and C of the ineqifalities 


n 


p < t; 



€ 


are represented, respectively, by the sums 

A = Tx + Tx+i 4" ■ * ■ + Tfi^i 

C — Tf, A- A- ' ‘ ' A- Tn 

the first of which contains fx — \ — g terms. Combining terms of the 
second sum into groups of g terms (the last group may consist of less than 
g terms) and setting for brevity 

Ai = Tfi A~ 7^m+i + • • • + !r^+o_i 

A 2 - + T^+y+1 + • ' * + Tt,J^2a-\ 

^3 = T^^2a + + • ' ' + TM-fStf-l 


we shall have 


C = /li + yl2 + As + • • • 


and at the same time 


( 2 ) 

The ratio 


Al <r Al — 
A ^ Tx Ai ^ TA 


Al — Tx+( 7 + Tx+(;+i + • • ' + T\^2g-l 
A T\ + Tx+i "f- * • * + Tx+ff-i 


is less than the greatest of numbers 

Tx^g Tx+j^+l 


7V+2(7—-1 
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But by inequality (1) 


Tx ^ Tx+i 


7^X-f* 2 —1 


hence 


Similarly, 


A ^ Tx 


^2 ^ 

j; 

and again by inequality (1) 


Consequently 


A 2 . Tn 

Xi<7\’ 


-^3 ^ ^ M4-2g 
^2 ^ 


^M4-2g ^ 7^M-+-g 

ATT ^ m ^ 


Az Tft 

X, < w 


and inequalities (2) are established. 
c. For X ^ \ 


< 1* 


It suffices to show that 


As X ^ np 


Tx+i _ n — \ p ^ 

Tx \+lq 

n — X p ^ npq ^ . 
X + 1 9 “ npg + q 


which shows that <1. 

I X 

The inequality just established shows that in the following expression: 


hi T'm .Tm-i 

Tx Th -2 


T^'T^x 


all the factors are <1. Consequently, if we retain a ^ g first factors 
only, replacing the others by 1, we get 

Tm ^ I’m T^-x T^^+x 

Tx - T^x ■ T^-2 ■ ■ ■ 


-Ijl. < < 

M-1 i M-2 


Moreover, 
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whence the following important inequality results: 


(3) 


^ ~ M + g pV 
Tx \m — a + 1 7/ 


Here a is an arbitrary positive integer g g. 

Now, let 6 be an arbitrary positive number, 
for 


(4) 

we have both 


> 0^(1 + - 


Then we can show that 


(\) n - n + a p ^ __p _ 
^ ^ fx — a I q “p + e 


and 


(ii) oc ^ g. 


Since m ^ np + ne, it suffices to show that (i) is satisfied for p = np + ne. 
If p = np + ne inequality (i) is equivalent to 


nq — ne + a ^ q 
np + ne — a + i ~p + 6 


or, after obvious simplifications, 

ne{p + €) ^ a{l + e) - q. 

But this inequality follows from (4). To establish (ii), since a and g 
are integers, it suffices to show^ that a < g + 1, But p ^ np + ne, 
X < np + 1 and consequently ^ + 1 > Hence (ii) will be estab¬ 
lished if we can show that ne ^ a which by virtue of (4) will be true if 

a(l + e) - q ^ ^ 
p + € " 

that is, if 

a(l + e) — g ^ ap + «c 

or ag — g ^ 0 which is obviously true, a being a positive integer. 

d. The auxiliary integer a is still at our disposal. Given an arbitrary 
positive number r; < 1 we shall determine a as the least integer satisfying 
the inequality 

log (l + l) 

At the same time 
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and since loe: (1 + - ) > —we shall have 

V V/ V + ^ 

. 1 I P "h ^ 1 1 

O' < 1 + ^ log - 

e 7/ 


Consequently, if 


a(l + €)— // 1 +e. 

e(p + e) e- V ^ 


^ 1 +e. 1,1 

n ^ — log —h - 

6 “ 17 € 


then by virtue of (i) and (3) 

n ^ 

and by virtue of (2) 

/li < Aijy A2 < A17} < Ar}-, A -i < A 27} < A7}'\ 


whence 


C < Av + At}^ + Ar}^ 


This inequality holds if n satisfies (5). No trace of the auxiliary 
integer a is left. 

e. Let us now consider the inequalities 

— € < - p < 0 and- p S — e 

?i 71 

and introduce their respective probabilities B and I). These inequalities 
are equivalent to 

^ n — m ^ , n — TTi . 

n n 

It is apparent that we can interpret B or D probabilities that the num¬ 
ber of occurrences m' = n — m of the event F opposite to in n trials will 

Tvf Tvf 

satisfy either the inequality 0<~^j - q < e or -Since 

the right-hand side of (5) contains only given numbers e, r} it is clear that 

(7) D < 

if (5) is satisfied. 

Now A + B ~ P \s the probability of the inequality 

m . 

- p < € 

n 



Sec. 2] 


BERNOULLPS THEOREM 


101 


and C + D = Q is the probability of the opposite inequality 


m 

n 




Hence P + Q = 1. 


ConseqiKuitly, 


Moreover, l)y (6) and (7) 

V 


Q < 


1 


+1 


Pv 


> 1 


p > I - 


if only 


n > 


1 + 


log 


€ 


This completes the proof of Bernoulli’s th(M)rem. 


For example, if p 


— *2 ^ = 0.01, T] = 0.001 we get from (5) 

n ^ 69,860 

which shows that in 69,869 trials or more there are at least 999 chances 

/loo* 


against 1 that the relative frequen(‘y will differ from by less than 
The number 69,869 found as a lower limit of the number of trials is 
much too large. A much smaller number of trials would suffice to fulfill 
all the requirements. From a i)ractical standpoint, it is important to 
find as low a limit as possible for the necessary number of trials (given e 
and rj). With this problem we shall d('al in the next chapter. 

2. Bernoulli’s theorem states that for arbiti'arily given e and 77 there 
exists a number no(e, 77 ) such that for any single value n > ??o(e, 77 ) the 
probability of the inequality 


P 


< € 


will be greater than 1 — 77 . The question naturally arises, whether for 
given € and 77 a number N{€, 77 ) depending upon e and rj can be found such 
that the probability of simultaneous inequalities 


m 

~ - P 


< € 


for all n > 77 ) will still be greater than 1 — 77 . The following theo¬ 

rem due to Cantelli shows that this question can be answered positively. 

Cantelli’s Theorem. Por given e < \, y} < \ let N he an integer 
satisfying the inequality 

iV > I log 4 + 2 
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The probability that the relative frequencies of an event E will differ from 
p by less than e in the Nth and all the following trials is greater than I — 77 . 
Proof. We shall prove first that the probability Qn of the inequality 


will always be less than According to results proved in the 

preceding section for any 77 > 0 


if 


Qn < V 


^ 1 + e, 1.1 

n > —^ log - H- 

r) t 


This inequality, if we take r) = becomes 

^ 1 + e , 1 1 + €, „ 

n > — 2 -w + --^ log 2 


and in this form it is evident, since for 6 < 1 


1 _ log 2 < 1 - 2 log 2 < 0 . 

Hence, as stated, 

( 8 ) Qn < 

The event A, in which we are interested, consists in simultaneous 
fulfillment of all the inequalities 



< € 


for n = iV, iV + 1, A + 2, . . . . The opposite event B consists in 
the fulfillment of at least one of the inequalities 



> 


€ 


where n can coincide either with A, or with A" + 1, or with A + 2, . . . • 
The probability of B, which we shall denote by R, certainly does not 
exceed the sum of the probabilities of aU the inequalities 


m 

n 




forn - A, + 1, W + 2, . . . . 
Consequently, referring to (8), 


R < 2^ = 

n-N 


2e-}w« 

1 - e-i“' 
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To satisfy the inequality 




it suffices to take 


Now 


N > 


2 , 2 , 2 , 

log - + log 





<72 log ^^> + 2. 


Consequently, if 

2 4 

^ 4 log ^ + 2 

we shall have R < v and at the same time the probability of A will be 
greater than I — v, which proves Cantelli’s theorem. 


Significance of Bernoulli’s Theorem 

3. As was indicated in the Introduction, one of the most important 
problems in the theory of probability consists in the discovery of cases 
where the probability is very near to 0 or, on the contrary, very near to 1, 
because cases with very small or very great” probability may have real 
practical interest. In Bernoulli’s theorem we have a case of this kind; 
the theorem shows that with the probability approaching as near to 1 
or certainty as we please, we may expect that in a sufficiently long 
series of independent trials with constant probability, the relative fre¬ 
quency of an event will differ from that probability by less than any 
specified number, no matter how small. But it lies in the nature of the 
idea of mathematical probability, that when it is near 1, or, on the con¬ 
trary, very small, we may consider an event with such probability as 
practically certain in the first case, and almost impossible in the second. 
The reason is purely empirical. 

To illustrate what we mean, let us consider an indefinite series of 
independent trials, in which the probability of a certain event remains 
constantly equal to It can be shown that if the number of trials 
is, for instance, 40,000 or more, we may expect with a probability > 0.999 
that the relative frequency of the event will differ from by less than 
0.01. In other words, we are entitled to bet at least 999 against 1 that 
the actual number of occurrences will lie between the limits 0.49n and 
0.51n if n ^ 40,000. If we could make a positive statement of this 
kind without any mention of probability, we should be offering an ideal 
scientific prediction. However, our knowledge in this case is incomplete 
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and all we are entitled to state is this: we are more sure to be right in 
predicting the above limits for the number of occurrences than in expect¬ 
ing to draw a white ball from an urn containing 999 white and only 1 
black ball. 

In practical matters, where our actions almost never can be directed 
with perfect confidence, even incomplete knowledge may be taken as a 
sure guide. Whoever has tried to win on a single ticket out of 10,000 
knows from experience that it is virtually impossible. Now the convic¬ 
tion of impossibility would be still greater if one tried to win on a single 
ticket out of 1,000,000. 

In the light of such examples, we understand what value may be 
attached to statements derived from Bernoulli's theorem: Although the 
fact we expect is not bound to happen, the probability of its happening 
is so great that it may really be considered as certain. Once in a great 
while facts may happen contrary to our expectations, but su(;h rare excep¬ 
tions cannot outweigh the advantages in everyday life of following the 
indications of Bernoulli's theorem. And herein lies its immense practical 
value and the justification of a science like the theory of probability. 

It should, however, be borne in mind that little, if any, value can be 
attached to practical applications of Bernoulli’s theorem, unless the 
conditions presupposed in this theorem are at least approximately ful¬ 
filled: independence of trials and constant probability of an event for 
every trial. And in questions of application it is not easy to be sure 
whether one is entitled to make use of Bernoulli’s theorem; consequently, 
it is too often used illegitimately. 

It is easy to understand how essential it is to discover propositions 
of the same character under more general conditions, paying especial 
attention to the possible dependence of trials. There have been valuable 
achievements in this direction. In the proper place, we shall discuss the 
more important generalizations of Bernoulli’s theorem. 

4. When the probability of an event in a single experiment is known, 
Bernoulli’s theorem may serve as a guide to indicate approximately how 
often this event can be expected to occur if the same experiments are 
repeated a considerable number of times under nearly the same condi¬ 
tions. When, on the contrary, the probability of an event is unknown 
and the number of experiments is very large, the relative frequency of 
that event may be taken as an approximate value of its probability. 
Bernoulli himself, in establishing his theorem, had in mind the approxi¬ 
mate evaluation of unknown probabilities from repeated experiments. 
That is evident from his explanations preceding the statement of the 
theorem itself and its proof. Inasmuch as these explanations are interest¬ 
ing in themselves, and present the original thoughts of the great discov¬ 
erer, we deem it advisable here to give a free translation from Bernoulli’s 
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book. After calling attention to the fact that only in a few cases can 
probabilities be found a priori, Bernoulli proceeds as follows: 

So, for example, the ninn})er of cases for dice is known. Evidently there are 
as many cases for each die as there are faces, and all these cases have an equal 
chance to materialize. For, by virtue of the similitude of faces and the uniform 
distribution of weight in a die, there is no reason why one face should show up 
more readily than another, as there would be if the faces had a different shape 
or if one part of a die were made of heavier material than another. So one knows 
the number of cases when a white or a black ticket can be drawn from an urn, 
and besides, it is known that all these cases are equally possible, because the num¬ 
bers of tickets of both kinds are determined and known, and there is no apparent 
reason why one of these tickets could be drawn more readily than any other. 
But, 1 ask you, who among mortals will ever be able to define as so many cases, 
the number, e.g., of the diseases which invade innumerable parts of the human 
body at any age and can cause our death? And who can say how much more 
easily one disease than another—plague than dro})sy, dropsy than fever— can 
kill a man, to enable us to make conjectures about the future state of life or 
death? Who, again, can register the innumerable cases of changes to which the 
air is subject daily, to derive therefrom conjectures as to what will be its state 
after a month or even after a year? Again, who has sufficient knowledge of the 
nature of the human mind or of the admirable structure of our body to be able, 
in games depending on acuteness of mind or agility of body, to enumerate cases 
in which one or another of the participants will win? Since such and similar 
things depend upon completel.y hidden causes, which, besides, by reason of the 
innumerable variety of combinations will forever escape our efforts to detect 
them, it would plainly be an insane attempt to get any knowledge in this fashion. 

However, there is another way to obtain what we want. And what is impossi¬ 
ble to get a priori, at least can be found a posteriori; that is, by registering the 
results of observations performed a great many times. Because it must be pre¬ 
sumed that something may occur or not occur as many times as it had previously 
been observed to occur or not occur under similar conditions. For instance, if, 
in the past, 300 men of the same age and physical build as Titus is now, were 
investigated, and it were found that 200 of them had died within a decade, the 
others continuing to enjoy life past this term, one could pretty safely conclude 
that there are twice as many cases for Titus to pay his debt to nature within the 
next decade than to survive beyond this term. So it is, if somebody for many 
preceding years had observed the weather and noticed how many times it was 
fair or rainy; or if somebody attended games played by two persons a great many 
times and noticed how often one or the other won; by these very observations he 
would be able to discover the ratio of cases which in the future might favor the 
occurrence or failure of the same event under similar circumstances. 

And this empirical way of determining the number of cases by experiments is 
neither new nor unusual. For the author of the book ‘‘Ars cogitandi,’^ a man 
of great acumen and ingenuity, in Chap. 12 recommends a similar procedure, 
and everybody does the same in daily practice. Moreover, it cannot be con¬ 
cealed that for reasoning in this fashion about some event, it is not sufficient to 
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make a few experiments, but a great quantity of experiments is required; because 
even the most stupid ones by some natural instinct and without any previous 
instruction (which is rather remarkable) know that the more experiments are 
made, the less is the danger to miss the scope. 

Although this is naturally known to anyone, the proof based on scientific 
principles is by no means trivial, and it is our duty now to explain it. However, 
I would consider it a small achievement if I could only prove wliat everybody 
knows anyway. There remains something else to be considered, whicli perhaps 
nobody has even thought of. Namely, it remains to inquire, whether by thus 
augmenting the number of experiments the probability of getting a genuine ratio 
between iiuml)ers of cases, in which some event may occur or fail, also augments 
itself in such a manner as finally to surpass any given degree of certitude; or 
whether the problem, so to speak, has its own asymptote; that is, there exists a 
degree of certitude wliich never can be surpassed no matter how the observations 
are multi})lied; for instance, that it never is possible to have a probability greater 
than or Ja tliat the real ratio has been attained. To illustrate this by an 

example, suppose that, without your knowledge, 3,000 white stones and 2,000 
black stones are concealed in a certain urn, and you try to discover tluar numbers 
by drawing one stone after another (each time putting back the stone drawn 
before taking the next one, in order not to change the number of stones in the 
urn) and notice how often a white or a black stone a])pears. The (question is, 
can you make so many drawings as to make it 10, or 100, or 1,000, etc., times 
more probable (that is, morally certain) that the ratio of frequencies of white and 
black stones will be 3 to 2, as is the case with the number of stones in the urn, 
than any other ratio different from that? If this were not true, I confess nothing 
would be left of our attempt to explore the numl)er of cases by experiments. 
But if this can be attained and moral certitude can finall}^ be acquired (how that 
can be done I shall show in the next chapter), we shall have cases enumerated a 
posteriori with almost the same confidence as if they were known a priori. And 
that, for practical purposes, where ^‘morally certain’^ is taken for ‘‘absolutely 
certain’^ by Axiom 9, Chap. II, is abundantly sufficient to direct our conjectures 
in any contingent matter not less scientifically than in games of chance. 

For if instead of an urn we take the air or the human body, that contain in 
themselves sources of various changes or diseases as the urn contains stones, we 
shall be able in the same manner to determine by observations how much more 
likely one event is to happen than another in these subjects. 

To avoid misunderstanding, one must bear in mind that the ratio of cases 
which we want to determine by experiments should not be taken in the sense of a 
precise and indivisible ratio (for then just the contrary would happen, and the 
probability of attaining a true ratio would diminish with the increasing number of 
observations) but as an approximate one; that is, within two limits, which, 
however, can be taken as near as we wish to each other. For instance, if, in the 
case of the stones, we take pairs of ratios and ^^^ooor^ooj^ooo and 

etc., it can be shown that it will be more probable than any degree of 
probability that the ratio found in experiments will fall within these limits than 
outside of them. Such, therefore, is the problem which we have decided to 
publish here, now that we have struggled with it for about twenty years. The 
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novelty of this problem as well as its great utility, combined with equal difficulty, 
may add to the weight and value of other parts of this doctrine.—‘‘Ars Conjee- 
tandi/^ pars quarta, Cap. IV, pp. 224-227. 

Application to Games of Chance 

5. One of the cases in which the (‘onditions for application of Ber¬ 
noulli's theorem are fulfilled is that of games of chance. It is not out 
of place to discuss the question of the coniinercial values of games from 
the standpoint of Bernoulli\s theorem. ^‘Game of chance^’ is the term 
we apply to any enterprise which may give us profit or may cause us 
loss, depending on chance, the probabilities of gain or loss being known. 
The following considerations can be ap})lied, therefore, to more serious 
questions and not only to games played for pastime or for the sake of 
gaining money, as in gambling. 

Suppose that, by the conditions of the game, a player can win a 
certain sum a of money, with the probability p; or can lose another 
sum h with the probability q — 1 — p. 

If this game can be repeated any number of times under the same 
conditions, the question aris€\s as to the probability for a player to gain 
or lose a sum of money not below a given limit. Let us denote by n 
the total number of games, and by m the number of times the player 
wins. Considering a loss as a negative gain, his total gain will be 

K = ma — (n — m)b. 

It is convenient to introduce instead of m another number « defined by 


a ~ m — up 


and called ^Miscrepancy.” Exjiressed in terms of a the preceding expres¬ 
sion for the gain becomes 


The expression 


K = n{pa — qh) + (a + h)a. 


E — pa — qb 


entering as the coefficient of n has, as wa. shall see, an important bearing- 
on the conclusion as to the commercial value of the game. It is called the 
‘^mathematical expectation^' of the player. Suppose at first that this 
expectation is positive. By Bernoulli's theorem the probability for a 
discrepancy less than — n€, e being an arbitrary positive number, is 
smaller than any given number, provided, of course, the number of games 
is sufficiently large. At the same time, with the probability approaching 
1 as near as we please, we may expect the discrepancy to be ^ — n€. 
However, if this is the case, the total gain will surpass the number 

n\E — €(a + b)] 
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which, for sufficiently large n, itself is greater than any specified positive 
number. It is supposed, of course, that € is small enough to make the 
difference 

E — -j- fc) 

positive. And that means that the player whose mathematical expecta¬ 
tion is positive may expect with a probability approaching certainty as 
near as we please to gain an arbitrarily large amount of money if nothing 
prevents him from playing a sufficient number of games. 

On the contrary, by a similar argument, w(^ can see that in case of 
a negative mathematical expectation, the player has an arbitrarily small 
probability to escape a loss of an arbitrarily large amount of money, 
again under the condition that he plays a sufficiently large number of 
games. 

Finally, if the mathematical expectation is 0, it is impossible to make 
any definite statement concerning the gain or loss by the player, except 
that it is very unlikely that the amount of gain or loss will be considerable 
compared with the number of games. 

It follows from this discussion that the game is certainly favorable 
for the player if his mathematical expectation is positive, and unfavorabh^ 
if it is negative. In case the mathematical expectation is 0, neither 
of the parties participating in the game has a decidc^d advantage and then 
the game is called equitable. Usually, games serving as amusements are 
equitable. On the contrary, all of the games operated for commercial 
purposes by individuals or corporations are expressly made to be profita¬ 
ble for the administration; that is, the matheniati(;al exp(^ctation of the 
administration of a game operated for lucrative purposes is positive at 
each single turn of the game and, correspondingly, the expectation of any 
gambler is negative. This confirms the common observation that those 
gamblers who extend their gambling over large numbers of games are 
almost inevitably ruined. At the same time, the theory agre(‘s with 
the fact that great profits are derived by the administrations of gaming 
places. 

A good illustration is afforded by the French lottery mentioned on 
page 19, which, as is well known, was a very profitable enterprise operated 
by the French government. Now, if we consider the mathematical 
expectation of ticket holders in that lottery, we find that it was negative 
in all cases; namely, denoting by M the sum paid for tickets, we find the 
following expectations: 

On 1 ticket (ii — 1)^ 

On 2 tickets (yj — 1)M = — 

On 3 tickets - l)M = 


and so forth, 
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On the other hand, the expectation of the administration was always 
positive, and because of the great number of persons taking pajt in this 
lottery, the number of games played by the administration was enormous, 
and it was assured of a steady and considerable income. This was an 
enterprise avowedly operated for the purpose of gambling, but the same 
principles underlie the operations of institutions having great public 
value, such as insurance companies, which, to secure their income, always 
reserve certain advantages for themselves. 

Experimental Verification of Bernoulli's Theorem 

6 . Bernoulli's theorem, like any other mathematical proposition, is 
a deduction from ideal premises. To what extent these premises may be 
considered as a good approximation to reality can be decided only by 
experiments. Several experiments established for the purpose of testing 
various theoretical statements derived from general propositions of the 
theory of probability, are reported by different authors. Here we shall 
discuss those purporting to test Bernoulli's theorem. 

I. Buff on, the French naturalist of the eighteenth century, tossed a 
coin 4,040 times and obtained 2,048 heads and 1,992 tails. Assuming 
that his coin was ideal, we have a probability of 32 either heads or 
tails. Now, the relative frequencies obtained by his experiments are: 

iolS = 0.507 for heads 
ini = 0.493 for tails 

and they differ very little from the corresponding probabilities, 0.500. 
^ ' ‘ the conclusions one might derive from Bernoulli's theorem 

are venucu m a very satisfactory manner. 

II. De Morgan, in his book ^‘Budget of Paradoxes" (1872), reports 
the results of four similar experiments. In each of them a coin was 
tossed 2,048 times and the observed frequencies of heads were, respec¬ 
tively, 1,061, 1,048, 1,017, 1,039. The relative frequencies corresponding 
to these numbers are 

iUi = 0.518; mi = 0.512; mi = 0.497; Hff = 0.507. 

The agreement with the theory again is satisfactory. 

HI. Charlier, in his book Grundziige der mathematischen Statistik," 
reports the results of 10,000 drawings of one playing card out of a full 
deck. Each card drawn was returned to the deck before the next draw¬ 
ing. The actual result of these experiments was that black cards 
appeared 4,933 times, and consequently the frequency of red cards was 
5,067. The relative frequencies in this instance are: 

tWo% = 0.4933 for a black card 
iViftrV = 0.5067 for a red card 
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and they differ but slightly from the probability, 0.5000, that the card 
drawn will be black or white. The agreement between theory and experi¬ 
ment in this case, too, is satisfactory. 

IV. The author of this book made the following experiment with 
pla 3 dng cards: After excluding the 12 face cards from the pack, 4 cards 
were drawn at a time from the remaining 40, and the number of trials 
was carried to 7,000. The number of times in each thousand that the 
four cards belonged to different suits, was: 

I II III IV V VI VII 

113 113 103 105 105 118 108 

Altogether the frequency of such cases was 765 in 7,000 trials, whence 
we find for the relative frequency 

~ 0.1093 

while the probability for taking 4 cards belonging to different suits is 

im = 0.1094. 

V. In J. L. Coolidge^s ^‘Introduction to Mathematical Probability,^^ 
one finds a reference to an experiment made by Lieutenant R. S. Hoar, 
U.S.A., but the reported results are incomplete. The author of this book 
repeated the same experiment which consisted in 1,000 drawings of 5 cards 
at a time, from a full pack of 52 cards. The results were: 503 times the 
5 cards were each of different denominations; 436 times 2 were of the same 
denomination with 3 scattered; 45 times there were 2 pairs of 2 different 
denominations and 1 odd card; 14 times 3 were of the same 

with 2 scattered; 2 times there were 2 of one denomination and 3 of 
another. The remaining possible combination, 4 card?^ oi' the 
denomination with 1 odd, never appeared. The probabilities of these 
different cases are, respectively, 

liif = 0.507; mi = 0.423; M = 0.048; 

mz -= 0 . 021 ; = 0 . 001 ; = 0 . 000 . 

The corresponding theoretical frequencies are 507, 423, 48, 21, 1, 0, 
while the observed frequencies were 503, 436, 45, 14, 2, 0. The dis¬ 
crepancies are generally small and the greatest of them, 13, is still within 
reasonable limits. Deeper investigation shows that the probability that 
a discrepancy will not exceed 13 is about 3^; hence, the observed deviation 
of 13 units cannot be considered abnormal. 

VI. Bancroft H. Brown published, in the American Mathematical 
Monthly, vol. 26, page 351, the results of a series of 9,900 games of craps. 
This game is played with two dice, and the caster wins unconditionally 
if he produces 7 or 11 points, which are called “ naturals^ he loses the 



Sec. 6] 


BERNOULLrS THEOREM 


111 


game in case of 2, 3, or 12 points, called “craps.^^ But if he produces 
4, 5, 6, 8, 9, or 10 ‘^points,” he does not win, but has the right to cast the 
dice an unlimited number of times until he throws the same number of 
IX)ints that he had before, or until he throws a 7. If he throws 7 before 
obtaining his point, he loses the game; otherwise he wins. 

It is a good exercise to find the probability of winning this game. 
It is 

m = 0.493 

that is, a little less than Multiplying the number of games, in our 
case 9,900, by this probability, we find that the theoretical number of 
successes is 4,880 and of failures, 5,020. Now, according to Bancroft H. 
Brown, the actual numbers of successes and losses are, respectively, 
4,871 and 5,029. The discrepancy 

4871 - 4880 = -9 

is extremely small, even smaller than could rc^asonably be expected. 
The same article gives the number of times ^‘cra])s’^ were produced; 
namely, 2 appeared 259 times, 3 appeared 508 times, and 12 appeared 
293 times, making the total number of craps 1,060. The probability 
of obtaining craps is 


= J 

hence, the theoretical number of craps should be 1,100. The discrepancy, 
1060 — 1100 = —40, is more considerable this time but still lies within 
reasonable limits. 

VII. E. Czuber made a complete investigation of lotteries operated 
on the same plan as the French lottery, in Prague between 1754 and 1886, 
and in Briiiin between 1771 and 1886. The number of drawings was 
2,854 in Prague and 2,703 in Briinn. The probability that in each draw¬ 
ing the sequence of numbers is either incjreasing or decreasing, is 

== 0.01667 

while the observed relative frequency of such cases was 
Prague: 0.01612; Briinn: 0.01739 
and in both places combined 

0.01674. 

The probabilities that among five numbers in each drawing there is 
none or only one of the numbers 1, 2, 3, ... 9, are, respectively. 


0.58298 and 0.34070. 
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The corresponding relative frequencies were 

Prague: 0.58655 and 0.32656 
Briinn: 0.57899 and 0.34591 

and in both places combined 

0.58183 and 0.33587, respectively. 

The probability of drawing a determined number is i ^ §. Now, according 
to Czuber, for the lottery in Prague the actual number of occurrences for 
single tickets varied from 138 (for No. 6) to 189 (for No. 83), so that for 
all tickets the discrepancy varied from —20 to 31. Besides, there were 
only 16 numbers with a discrepancy greater than 15 in absolute value. 
All these results stand in good accord with the theory. 

VIII. One of the most striking experimental tests of Bernoulli\s 
theoreun was made in connection with a j)roblem (‘onsidered for the first 
time by Buffon. A board is ruled with a series of equidistant parallel 
lines, and a very fine needle, which is shorter than th(' distance between 
lines, is thrown at random on the board. DeTioting by I the h^iigth of 
the needle and by h the distance between liii(‘s, the probability that the 
needle will intersect one of the lines (the other possibility is that the 
needle will be completely contained within the strip between two lines) is 
found to be 


21 

The remarkable thing about this expression is that it contains the 
number tt = 3.14159 * * • expressing the ratio of the circumference of a 
circle to its diameter. In the appendix we shall indicate how this expres¬ 
sion can be obtained, because in this problem we deal with a different 
concept of probability. 

Suppose we throw the needle a great many times and count the 
number of times it cuts the lines. By Bernoulli's theorem we may expect 
that the relative frequency of intersections will not differ greatly from 
the theoretical probability, so that, equating them, we have the means of 
finding an approximate value of tt. 

One series of experiments of this kind was performed by R. Wolf, 
astronomer in Zurich, between 1849 and 1853. In his experiments the 
width of the strips was 45 mm., and the length of the needle was 36 mm. 
Thus the theoretical probability of intersections is 


72 

457r 


0.5093. 
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The needle was thrown 5,000 times and it cut the lines 2,532 times; 
whence, the relative frequency 

mi = 0.5064. 

The agreement between the two numbers is very satisfactory. If, 
relying on Bernoulli’s theorem, we set the approximate equation 

72 

= 0.5064, 

457r 

we should find the number 3.1596 for tt, which differs from the known 
value of TT by less than 0.02. 

In another experiment of tlie sanu^ kind reported by J)e Morgan in 
the aforementioned book, Amlnose Smith in 1855 made 3,204 trials with 
a needle the length of which was ^5 of the distance between lines. There 
were 1,213 clear intersections, and 11 contacts on which it was difficult 
to decide. If on this ground, we should consider half of them as inter¬ 
sections, we should obtain about 1,218 intersections in 3,204 trials, which 
would give the number 3.155 for t. If all of the contacts had be^en treated 
as intersections the result would have been 3.1412—very close to the 
real valine of tt. 

In an excellent book ^TJalcolo delle Probabilita,^’ vol. 1, page 183, 
1925, by G. Castelnuovo, reference is made to expc^rimeiits performed by 
Professor Reina under whose direction a needle of 3 cm. in length was 
thrown 2,520 times, the distance between lines being 6 (‘in. Taking into 
account the thickn(*ss of the needle, the jirobability of intersection was 
found to be 0.345, while actual (^xperimcmts gave th(‘ relative frequency 
of intersections as 0,341. 

Appendix 

Buffon’s Needle Problem. Let h be the width of the strip between 
two lines and I < h the length of the needle. The position of the needle 
can be determined by the distance x of its middle point from the nearest 
line and the acute angle <p formed by the needle and a perpendicular 
dropped from the middle point to the line. It is apparent that x may 
vary from 0 to h/2 and v? varies within the limits 0 and t/2. We cannot 
define in the usual way the probability of the needle cutting the line, for 
there are infinitely many cases with res]iect to the position of the needle. 
However, it is possible to treat this problem as the limiting case of 
another problem with a finite number of possible cases, where the usual 
definition of probability can be applied. 

Suppose that h/2 is divided into an arbitrary number m of equal 
parts 8 — h/2m and the right angle 7r/2 into n equal parts co = 7r/2n. 
Suppose, further, that the distance x may have only the values 

0, 5, 26, . . . mh 
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and the angle (p the values 

0 , o), 2c«j, . . . nco. 

This gives 

N ^ {m + l)(n + 1) 

cases as to the position of the needle, and it is reasonable to assume that 
these cases are equally likely. To find the number of favorable cases, we 
notice that the needle cuts one of the lines if x and ip satisfy the inequality 

/ I 

a: < 2 cos ip. 


The number of favorable cases therefore, is equal to the number of 
systems of integers ^, j satisfying the inequality 

(A) i6<^QOsjo> 


supposing that i may assume only the values 0, 1, 2, ... m and j only 
the values 0, 1, 2, ... n. Because we suppose I < h the greatest 
value of i satisfying condition (A) is less than m and we can disregard 
the requirement that i should be Now for given j there are & + 1 

values of i satisfying (A) if k denotes the greatest integer which is less 
than 

I 

TTx JO), 


In other words, k is an integer determined by the conditions 

k < ~ cos jo) ^ k + 1. 


The number of possible values for i corresponding to a given j can 
therefore be represented thus 


rrij = ^ cos jo) + t?/ 


where t?/ may depend on j but for all j is ^0 and < 1. Taking the sum 
of all the rrij corresponding to j == 0, 1, 2, . . . n, we obtain the number 
of favorable cases 


M = ^(1 + cos w + cos 2a? + 


+ cos no?) + 710 


where 0 again is a number satisfying the inequalities 


0 ^ 0 < 1 . 
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But, as is well known, 


1 + cos CO 4- cos 2co + 


. 1 , sm (n + J)co 

+ cos 7^C0 = jr i- 

Z ^ . CO 

2sm^ 


or, because ^ ~ ^ 


1 + cos CO + cos 2co + 


1 1 CO 

4 - cos nco = 2 + 2 2' 


therefore 

^ i i 

Dividing this by N — {m + l)(n 4- 1) and substituting for 6 and co 
their expressions 

^ h TV 

2m ^ 2n 

we obtain the probability in the problem with a fini.e number of cases 


M 

'N 


I m. 1 ^ ^ 

2h m + 1 n + 1 2^ m 4- 1 


4- 


n0 


n + 1 (n 4- l)(m 4- 1) 


Tlie probability in Buff on \s problem will be obtained by making m 
and n increase indefinitely in the above expression. Now, since 


lim 


m 4- 1 


= 1 , 


lim 


m 


(m 4- l)(n + 1) 


== lim 


(n 4- l){ni 4- 1) 


= 0 (m, n —> 00) 


and 


we have 


cot 


lim 


4n 4 


n 4" 1 


M 21 


Thus we arrive at the expression of probability 


V == 


hjT 


in Buffon^s needle problem. 
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Problems for Solution 

Another very simple proof of Bernoulli’s theorem, due to Tshebysheff (1821- 
1894), is based upon the following considerations: 

1 . Prove the following identities: 

n n 

^ Tmirn — np) = 0, ^ Tm(m — np)^ — npq, 

m — 0 wi = 0 

Indication of the Proof. Differentiates the identity 

n 

m = 0 


twice with respect to and set = 0 . 

2 . If Q is the probability of tins inequality — np\ ^ ne prove that 


Q < 




Indication of the Proof. In the identity 


n 

^ T,n{m — np)^ 

»j = 0 


npq 


drop all the terms in which |m — np\ < ne and in the remaining terms replace 

{m — npY 

by n'^e^. The resulting inequality 


< 


|w —n/)l^nc 


is equivalent to the statement. 

3 . Prove that 

P > 1 - 77 

if n > pq/rje^. 

Indication of the Proof. P — I — Q, Q < pq/ne^ and pq/ne"^ < 77 if n > pq/qe^. 
The following two problems show how probability considerations can be used in 
proving purely analytical propositions. 

4 . S. Bernstein''8 Proof of Weierstrass^ Theorem. The famous theorem due to Weier- 
strass states that for any continuous function /(x) in a closed interval a ^ x S b there 
exists a polynomial P{x) such that 


1/(0:) - P { X )\ < a 


for a ^ X ^ b where a is an arbitrary positive number. By a proper linear trans¬ 
formation the interval (a, b) can be transformed into the interval (0, 1 ). According 
to S. Bernstein, the polynomial 


F(x) = 

m — O 
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for sufficiently large n satisfies the inequality 

|/(i) - P(x)| < a 

uniformly in the interval 0 ^ x ^ 1. 

Indication of the Proof. For x = 0 and x — 1 we have /(O) = P( 0 ) and 

/(I) - P(l). 

It suffices to prove the statement for 0 < x < 1. Let x be a constant probability in 
n independent trials. We have 

n 

(a) /(X) - dx) = 

7n = 0 

By the property of continuous functions, there is a number e corresponding to any 
positive iiumb(n a such that 



|/(x') -/(x)| <~ 

whenever 

lx' — x! < 6 (0 ^ x', X ^ 1). 

Also, there exists a number M such that |/(x)| ^ Af for 0 ^ x ^ 1. From equation 
(a) we got 


|/(x) - P(x)! + 2 MR 


where P and R arc, respectively, the probabilitic's of the inequalities 


ni 


m 

— — X < € 

and 

-- — X 

n 


n 


P < 77 



Now P < 1 and 

if > 1/46^77. Take 77 = a/AM) then 

|/(X) - P(X)| < cr 


if 


n > 


M 


6 . Show that 


J ” x”"(l — x)^~”^dx 

m 

n _________ 

^ x'^Cl — x)"~’”dx 


> 1 


1 


2 {n -f l)e* 


m m 

provided 0 < m < w and- e>0,-[-€<1 (Castelnuovo). 

n n 
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Indication of the Proof. By Prob. 6, Chap. IV, page 72, the ratio 


m _ 

£1^ 





”^dx 


represents the probability Q of at least m -f 1 successes in a series of n 1 inde¬ 
pendent trials with constant probability 


Set 

whence 

But 

Hence 


7n 

p --e. 

n 


m -h 1 = (w -h l)p + (n 1)<t 


+ e > c. 


n(n -f 1) 




(n -I- l)<r2 4(n + 1)6* 

m 

jj’ _ x)^~”^dx ^ 


— < 


- x)--dx 4(n + 1)6* 


and by a similar argument 
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CHAPTER VII 


APPROXIMATE EVALUATION OF PROBABILITIES IN 
BERNOULLIAN CASE 

1. In connection with Bernoulli's theorem, the following important 
question arises: when the number of trials is large, how can one find, at 
least approximately, the probability of the inequality 



where e is a given number? Or, in a more general form: How can one 
find, approximately, the probability of the inequalities 

I 

where I and V are given integers, the number of trials n being large? 

The exact formula for this probability is 

p = Xt. 

8^1 

where as before, represents the probability of s successes in n trials. 
While this formula cannot be of any practical use when n and V — I 
are large numbers, yet it is precisely such cases that present the greatest 
theoretical and practical interest. Hence, the problem naturally arises 
of substituting for the exact expression of P an approximate formula 
which will be easy to use in practice and which, for large n, will give a 
sufficiently close approximation to P. De Moivre was the first suc¬ 
cessfully to attack this difficult problem. After him, in essentially the 
same way, but using more powerful analytical tools, Laplace succeeded 
in establishing a simple approximate formula which is given in all books 
on probability. 

When we use an approximate formula instead of an exact one, there 
is always this question to consider: How large is the committed error? 
If, as is usually done, this question is left unanswered, the derivation of 
Laplace^s formula becomes an easy matter. However, to estimate the 
error comparatively long and detailed investigation is required. Except 
for its length, this investigation is not very difficult. 

2. First we shall present the probability T, in a convenient analytical 
form. The identity 

Fif) = (p< + qY = To + Txi + + • • • + 

119 
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after substituting t = becomes 

F{U^) = Tq+ TiU^ + T2e^^ + • • • + Tne^^^. 
Multiplying it by and integrating between — tt and tt, we get 


c-^^^F(U^)d(p = 2TrTs 


because for an integral exponent k 


Thus 



0 

2t 


if 

if 


/b = 0. 


Ts = 





and this is the expression for Ts suitable for our purposes. To find the 
sum 

a=r 

p=Xt. 

8—1 

we observe first that 



On the other hand, the complex number F(eJ^) can be presented in 
trigonometrical form, thus: 

whence 



or, because P is real. 



Finally, because R is an even function of (p and 0 is an odd one, we can 
extend the integration over the interval 0, tt on the condition that we 
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double the result. Thu.s we obtain 




It is (H>nvenient to introduce instead of I and V two numbers fi and f 2 
defined by 

I = np + i + ri\/5n, I' = np — + ('2\/Bn 

where Bn = npq. Setting further ^ 


P can be presented as 


e = npip + X, 


p = P2 - Pi 


where Pi and P2 are obt<ained by taking f == f 1 and f = f 2 in the integral 




- x). 


sin ^(p 


3. Our next aim is to establish upper and lower limits for R. 
Evidently 


P == (p2 4- g2 2pq cos <p)- = — 4pq sin^ — p^. 


5log( 


1 — 4pq sin- 


■2pq sin’- ^ - ^{‘IpqY .sin< | - 
- g(4p9)’ sin® I 


whence 


log p < -2pq sin2 


Since < 7r/2, we have 


and consequently 




log p < 


p < e 
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for all values of (p in the interval of integration. On the other hand, we 
have 

^3 


and 




2 ^ \ ^ ^ 

2^ 4 48’ 


< 24 


which gives another upper bound for p: 

(3) p < e 2_ 

The corresponding upper bounds for R will be 


(4) 

(5) 


R < e 


2Bu „ 
- ^2 


_«V+^V 

R < e 


To find a lower bound for R we shall assume <p ^ ir/2. We can 
present log p thus: 


log p = -^‘P^ - |(4pg)2 sin< | -H 2pq 


On the other hand, 


I) 


- g(4p9)* sin* I - 


—(4p(j')^ sin** — 

~{ipgy sin* I + ^{^pqY sin* | + • • • < - - - < |(4pg)*sin*| 

1 — 4^7 sin^ I 

and 


(i)’ 


sin2|>lsin*| 


so that 

2pq 


l) - sin21| - l(4p5)* sin* | - . . . > 


2pq 


sin* I - 


— |(4p§')* sin* ^ sin* ^|l - 32pV sin* || > 0 

and consequently 

log P > - 1(4??)* sin* I > - ^^* 
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if ^ g Hence, 

( 6 ) 22 > 


and this is valid for tp ^ 7r/2. 

4. Let T be defined by 

Assuming Bn ^ 25 from now on, we shall have, 


and a fortiori r < 7r/2. Let us suppose now that p varies in the interval 
0 ^ (p ^ T. By inequality (6) we shall have 


- Bn<p^ 

R — e ^ > e 




> 




1 II 


because --1>—a:forx>0 and pq ^ 

On the other hand, using inequality (5), we find that 

R _ -4 < 


since 


Bnr > 3 

^^24 =le'8<J. 


From the two inequalities just established it follows that 


(7) 

in the interval 








0 ^ ^ ^ T. 


5. We turn now to the angle 0. Evidently 


where 


Q — n arc 


tg 


p sin (p 
q + p cos (p 


= no> 


03 = arc tg 


p sin (p 
q + p cos <p 


By successive derivations with respect to <p we find 
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^ + pg' cos ip ^ d‘^0) _ pg(p ~ g) sin <p 

dip p 2 ^ 2pg cos ip + g^^ dip^ (p^ 2pg cos ip + g^y 
+ (1 - ^Pg) cos - 2pg cos2 
pgvp g) (po 2pg cos ip + g^y 

d^w _ ( _ \ v?[ — 1 + 4pg'+20p‘^g'2-|-gpg(l _2pg) cos ip — ^p'^g’^ cos^ ip\ 

dip* ^ (p2+2pg' cos 

and for ^ = 0 

(I). - *’• (s). - "• ($). - - «>■ 

Furthermore, one easily verifies that in the interval 0 ^ g x/2 
|0| ^ |p9b - 9l(l - 4pg sin^ l) 

^ ^ 2pg|p - g|^l - ‘ipq sin^ <p- 

Hence, applying Taylor^s formula and supposing 0 ^ ^ r, we get for x 

(8) X = iBnip ~ g)ip^ + Mip^ 

where 

(9) \M\ < ^Bn\p - g|(l - pqr^)-*, 

or 

(10) X = 
where 

(11) |L| < -AB„\p - 9i(l - pqT^)-\ 

Using inequalities (9) and (11), wc easily find 

(12) sin (fv^v? - x) = sin — kBn{p - g)ip^ cos {^\/Br,ip) + r 

where 

(13) |r| < ,VBn|p - g\(l - pgr2)-V^ + ^UBlip - gm - pgr^y^ip^ 
provided 0 ^ v? ^ r. 

6. To find an appropriate expression of the integral J we split it into 
two integrals, J\ and J 2 , taken respectively between limits 0, r and r, tt. 
We have 
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because sin ~ Let ri = r then by inequality (4) 

Z TT ^ 

2Bn 


pT.'i'P ^ d,p _ f" 

J., ^ Kit. « ■ 


But for positive :c the following inequality holds: 


(14) 

(consequently 


r ^ e--^du 
Jx U 


< 




i: 


J^dip f>-i^nT 2 ^-iy/Wn 


<P LfnT^ 3^1 

Noting that R{<p) is a decreasing function of we have for t ^ (p ^ ti 
RM ^ R{r) < 


Hence, 




and combining this inequality with the one previously established, we 
have finally 


(15) 


l^2| < (I log I + 




7. More elaborate considerations are ru^eessary to separate the 
principal term and to estimate the error term in Ji. Making use of the 
inequality 


1 


sin X X 


< 


6 sin X 


we can present Ji thus: 


= u 


2 fsin (f y/Bnip — x) 


dip “I- A 


where 


|4| < 


-,r 

in 


R(pd(pj 


487r sin ^ 


and, because R < in the interval 0 < (p < t 
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Since ^ % we find by direct numerical calculation 


-^- < 0.0205, 

32?r sin ^ 


and so, finally, 


|A| < 0.0205J5-1. 


8. Referring now to inequality (7), we can write 


2 Bin ^ 2 

where 

^ kl' 

Combining this with the result of the preceding section, we can present 
Ji thus 

(16) •^1 = 1- sm (rv^y - ^ 

JttJo ^ 

and 

IA 2 I < 0.06055-1. 

9. To simplify the integral in the right member of (16), we substitute 
for sin {^\/Bn<p — x) its expression (12). Taking into account inequal¬ 
ity (13), we get (17): 

2 = 2 rv>a„..sin _ 

Air Jo tp ^Trjo (p 

~ 6ir ~ Jo ® ^ {tV^«<p)d<p + A, 

where 


- ?l(l 


But 


i 





e~^^”*^^<p^d<p 


+ 


8B-», 
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and so 


lAj|.< - 9l(l - pqr^)-* + “(p - g)Hl - 

Now pq ^ 3"^, ^ Bn ^ 25, consequently 

-4=BnHi - pqr^)-* ^ —^^ 7 =-fT?Y < 0.0385. 

W2t 20V^V7/ 

On the other hand, 

1 - ^ 1 - |{(^^) - (^) } = ^ + |(P - 9)^ 

and for positive x the maximum of 

is attained forx^ = whence it follows that 

^IP - ,1(1 - m')-’ i es(53)‘(i)' < 


Taking into account all this, we have 

IA 3 I < 0.09|p - q\B-\ 

10. As to integrals in the right-hand member of (17) we can write 


(18) 


ir 

27rJo 




sin (fV^V’) 


dip 


= Af' 

^ttJo 




(19) —- f cos {i:-\/W„ip)dip = 

mr Jo 

= r _. g) cos i^VK<p)dv -b As 

6 ir Jo 


where 


and 


because 


IA 4 I < - 

‘ ' TTjr <P StT 


e-^Wdu < xer*' 
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for X > 1, as can easily be proved. Finally, taking into account (15), 
(16), (17), (18), (19), we get 


( 20 ) 


2jr, 


j; 




0.065 + 0.09|p - q\ 


+ 


f cos {^\/R„<p)d<p 


since for Bn ^ 25 


3|„- + ^ + :B:'+ fiji <1. 

A O' A ' ^TT ' /TV^O 


rV^ 


It now remains to evaluate definite integrals in (20). We have 


( 21 ) 

( 22 ) 


2^ 

2ir 

Bnip - g) 
Qtt 


^ 00 
Jo ^ 




sin (rV^^) 


dif 




sin 


u 


du 


ao 

I cos (^\/Bn(p)d<p == 


^Vb 


^ CD m2 

-I C 

nJo 


cos 


Differentiating the well-known integral 


X 


62 


I 6“^^' COS hxdx = K\ he (a > 0) 

0 ^ \ a 

twice with respect to b, and after that substituting a = 3-^, 6 = f, we 
find for (22) this expression: 

' .(1 - f2)e-lf>. 


6V2irB„ 

On the other hand, an integral of the type 


L(a) 




,sin au 


du 


can be reduced to a so-called ^^probability integral/^ In fact, the 
derivation with respect to a gives 

0 e cos audu = ^ 

and since L(0) = 0, 

L{a) = 
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Consequently, integral (21) can be reduced to 


1 

—I 

V^Jo 

Having found an approximate expression of the integral J after sub¬ 
stituting in it f2 and f i for f and taking the difference of the results, \se 
find the desired expression of P. 

11. The result of this long and detailed investigation can be sum¬ 
marized as follows: 

Theorem. Let m he the number of occurrences of an event in a series 
of n independent trials with the constant prohahility p. The probability P 
of the inequalities 

2 + riV^<7 ^ ^ np — J 4- t^y/npq 

where extreme members are integers, can be represented in the form 


(23) 


P = 


1 C% 

V2jrJf, 


uj 

'^du + 


-"t—[<> 

6V 2Tnpq 


-n)e 


£f _ii.n 

^-(l-f?)e "J+"- 


The error term w saiisjic!^ the inequality 

npq 

provided npq ^ 25. 

By slightly increasing the limit of the error term, this theorem can 
be put into more convenient form. Let t\ and U be two arbitrary real 
numbers and let P denote the probability of the inequalities 


np + t\\^ipq ^ m S np + Uy/npq. 

If the greatest integers contained in 

np + f2 \/npq and nq — hy/ npq 

are respectively, A 2 and Ai, the preceding inequalities are equivalent to 

n — yli ^ m g A^- 


To apply the theorem, we set 

np — \ + ^2\^npq — A2 = np + U\/ npq — B2 
np + 2 - + ^i\^npq = n — Ai ^ np + ti\/npq + Bi 


62 and Bi being, respectively, the fractional parts of np + t 2 \/npq and 
nq — hy/ npq. Hence, 


fg = ^2 + 


“I B2 



ti 


y/ npq 
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Applying Taylor’s formula, it is easy to verify that 


1 1 re-ld.-!*:: 

v^Ji, VSJ,, 

- (1 - 

ZTupq 


whence, finally, we can draw the following conclusion: For any two 
real numbers h, U, the probability of the inequalities 

npq ^ m — np ^ ^2 npq 

can be expressed as follows: 


e " + (i - 

1 

^ 0.061 

-\/ 2'Knpq 


npq 

q - V ^ 
6\/ 2Trnpq 

(1 - me 2 - 

■ - tDe~^ 

^ 0.0691p — gj 
npq 


1 -I 


V 27rnpq 

+ -^-^^£=[(1 - - (1 - + U 


6 2 Tr npq 

where 62 and Bi are the respective fractional parts of 


and 


np + and nq — ti\/ npq 

lal < PM + 0.25IP - gt 

npq 


provided npq ^ 25. 

In particular, if ^2 = — the probability of the inequality 

\m — np\ ^ t\/npq 

is expressed by 

P = -|= rVl“’dM + + Q 

V 2tJo V 2Trnpq 


with the same upper limit for 12. Laplace, supposing that np + ty/npq 
is an integer in which case 62 = 0 and Oi is a fraction less than (npq)~^, 
gives for P the approximate expression 


p = -|= rvj'‘’dM+ 

V^Jo 


\^2Trnpq 


without indicating the limit of the error. Evidently Laplace^s formula 
coincides with the formula obtained here by a rigorous analysis, save for 
terms of the same order as the error term 12. 
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To find an approximate expression for the probability P of the 


inequality 

m 

^ € 


- p 

n ^ 

it suflSces to take 

^ 

/ n 



\ 

Then 




2 


X 


>V- 


'“‘e " du + 


1 — di — 02^ 2p3 

\/ 2Trnpq 


+ ^2 


and evidently P tends to 1 as n increases indefinitely. This is the second 
proof of Bernoulli's theorem. 

Referring to the above expression for the probability of the inequalities 


ti\/npq ^ m — np S t 2 \/npq 

and supposing that the number of trials n increases indefinitely while 
ti and i 2 remain fixed, we immediately perceive the truth of the following 
limit theorem: The probability of the inequalities 


tends to the limit 


^ m — np 
^ -s /npq 




1 p* 
V^Jti 


e~^^^du 


as n tends to infinity. 

This limit theorem is a very particular case of an extremely general 
theorem which we shall consider in Chap. XIV. 

12. To form an idea of the accuracy to be expected by using the 
foregoing approximate formulas, it is worth while to take up a few 
numerical examples. Let n = 200, p = q = }4 and 


95 ^ m ^ 105. 


The exact expression of the probability that m will satisfy these ine¬ 
qualities is 


P - 


200 ! 

lOOIlOOl 


2-*oo 1 4. 2^. 


100 100-99 , 

101 101 • 102 101 


100 -99 -98 


+ 


+ 


100 - 99 - 98 - 97 


+ 


102 - 103 
100 - 99 - 98 - 97 - 96 


101 - 102 - 103 - 104 ' 101 - 102 - 103 - 104 - 105 


)} 
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The number in the brackets is found to be 9.995776 and its logarithm to 
five decimals 

0.99982. 


The logarithm of the first factor, again to five decimals, is 


whence 


2.75088, 

log P = T.75070; P - 0.56325, 


and this value may be regarded as correct to five decimals. Let us see 
now what result is obtained by using approximate formulas. In our 
example 


and 


t'\/nj)q = = 5; 


< = -L = 0.707107 


2 r' - — 

-4=. e -^du = 0.52050. 


The additional term 


-^7=.: = 0.04394 

VlOOx 


and by Laplace^s formula 

P = 0.56444. 


This is greater than the true value of P by 0.00119. Now, the theoretical 
limit of the error is nearly 

= 0.004 


so that, actually, Laplace^s formula gives an even closer approximation 
than can be expected theoretically. 

When npq is large, the second term in Laplace's formula ordinarily 
is omitted and the probability is computed by using a simpler expression; 


P = 


2 n 


2 da. 


In our case this expression would give 


P = 0.52050 


instead of 0.56325 with the error about 0.043, which amounts to about 
8 per cent of the exact number. Such a comparatively large error is 
explained by the fact that in our example npq = 50 is not large enough. 
In practice, when npq attains a few hundreds, the simplified expression for 
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P can be used when an accuracy of about two or three decimals is con¬ 
sidered as satisfactory. In general, the larger t is, the better approxima¬ 
tion can be expected. 

For the second example, let us evaluate the probability that in 6,520 
trials the relative frequency of an event with the probability p = % 
will differ from that probability by less than c = Mo- To find t, we 
have the equation 

t\^ npq = en 

where 

n = 6520, p = I, g = f, € = 


which gives 


and, correspondingly. 


130.4 

/iMtl 


3.2965, 


2 -- 

= \ e ^du = 0.999021. 

V^Jo 


Since m satisfies the inequalities 


3912 ~ 130.4 ^ m ^ 3912 4- 130.4 


the fractions di and 62 are = ^2 = 0.4 and the additional term is 


—^ = 0.000009. 

\/3129.67r 

Hence, the approximate value of P is 

P = 0.999030. 


To judge what is the error, we can apply Markoff's method of con¬ 
tinued fractions to find the limits between which P lies. These limits are 

0.999028 and 0.999044. 


The result obtained by using an approximate formula is unusually good, 
which can be explained by the fact that in our example tis & rather large 
number. Even the simplified formula gives 0.999021, very near the 
true value. 

Finally, let us apply our formulas to the solution of the inverse 
problem: How large should the number of trials be to secure a probability 
larger than a given fraction for the inequality 
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Let us take, for example, p = 34, c = 0.01 and the lower limit of proba¬ 
bility 0.999. To find n approximately, we first determine t by the 
equation 


which gives 



Iff 

^du = 0.999, 


t = 3.291. 


Hence, 


n 




20000 

9 


(3.291)- = 24,066, approximately. 


We cannot be sure that this limit is precise, since an approximate formula 
was used. But it can serve as an indication that for n exceeding this 
limit by a comparatively small amount, the probability in question will 
be >0.999. For instance, let us take n = 24,300. The limits for m 
being 

8,100 - 243 g m g 8,100 + 243, 
we find t from the equation 


and correspondingly 



3.3068 


\ e '^du = 0.999057. 

V^Jo 

The additional term in Laplace\s formula being 0.000023, we find 
P > 0.99908 - 0.00006 > 0.999. 

Thus, 24,300 trials surely satisfy all the requirements. 

Problems for Solution 

1 . Find approximately the probability that the number of successes will be con¬ 
tained between 2,910 and 3,090 in 9,000 independent trials with constant probability 

Ans. 0.9570 with an error in absolute value <10“^ [using (23)]. 

2 . In Buffon^s experiment a coin was tossed 4,040 times, with the result that heads 

turned up 2,048 times. What would be the probability of having more than 2,050 
or less than 1,990 heads? Ans. 0.337. 

3. R. Wolf threw a pair of dice 100,000 times and noted that 83,533 times the 
numbers of points on the two dice were different. What is the probability of having 
such an event occur not less than 83,533 and not more than 83,133 times? Does the 
result suggest a doubt that for each die the probability of any number of points was % ? 
Ans. This probability is approximately 0.0898 and on account of its smallness some 
doubt may exist. 
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4. If the probability of an event E is what numb(;r of trials guarantees a 
probability of more than 0.999 that the difference between the relative frequency of 
E and 3^ will be in absolute value less than 0.01? Ans. 27,500. 

6. If a man plays 10,000 equitable games, staking $1 in each game, what is the 
probability that the increase or decrcast^ in his fortune will not exceed $20 or $50? 

Ans. {a) 0.166; (6) 0.390. 

6 . If a man plays 100,000 games of craps and stakes 50 cents in each game, what 

is the probability that he will lose less than $300? Ans. About Koo- 

7. Following the method developed in this chapter, prove the following formula 
for the probability of exactly m successes in n independent trials with constant 
probability p: 


T^ = 


“X/ 2Trnpq 


1 + 


{q - p){U - 3t) 
6 \/npq 


+ A 


where t is determined by the equation 


m = wp -f- t\/npq 

and 

. , 0.15 + 0.25|p - q\ , >— 

{npq)^ 

provided npq ^ 25. 

8 . Developments of this chapter can be greatly simplified if p = q — }4 (sym¬ 
metrical case). In this case one can prove the following statement: The probability 
of the inequalities 


n I 

2+2 + 




can be expressed as follows: 


P = 


1 


/: 




(r2^ - ^ 

12\/ 27m 


where |A| < l/2n® for n > 16, 

9. In case of “rare” events, the probability p may be so small that even for a 
large number of trials the quantity \ — np may be small; for example, 10 or less. 
In cases of this kind, approximation formulas of the type of Laplace’s cannot be used 
with confidence. To meet such cases, Poisson proposed approximate formulas of a 
different character. Let Pm represent the probability that in n trials an event with 
the probability p will occur not more than m times. Show that 


= " [1 + 1 + 17 ^ + 


+ 


1 • 2 • 3 •• • m 


4* A 


+ A 


where 


and 


|A| < (e* - l)Qm if Qm^i 
|Al < (e^ - 1)(1 - Qm) if Qm < 1 


1 . 

X -f T H— 
4 n 


X 


2(n - X) 
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Indication of the Proof. We have 


X n X* 

P m = IH-h -“j —~ “i + ■ * * + ■ 

L Q 1 • 2 ^2 


Now, since q = I 




1 - - IM - - 
k n/\ n 


:) (■ 


1 • 2 • 3 • • • m 


qm 


Y-W-I V-f 


\ / » \ 


2 \ — k 

~x (X + iP 
. A: = 0 2 (n-\) 

< € ^ e 


Consequently 


(-O' 


e2(r.-X) 1 + + 


1 ■ 2 • 3 ■ ■ ■ m 


1 — - ) < e 


[ X X^ X” I 

—i + ri+ ■ + 1-2.8-.-. } 


On the other hand, 


1 = 


n(n — 1) • • • (n — M + 1) 


= ^ ^---j 


whence 


1 - P„ < c’t(l - QY) 


Pm > e^Qm + 1 - e’^. 

The final statement follows immediately from both inequalities obtained for P„ 
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10 . With the usual notation, show that 


T ^ == e -^—0 
ml 


where 


rnX (n —m)X2 m(m — l) 
Q == = 


rrt — l)r /. 


(n — m,)\^ 


4- 


0 


3(n — X)-‘ 2ri{n — 7n) ^ 
Indication of the Proof. Referring to Cliap. I, page 23, wo have 


0 < 0 < 1 . 




T 

Tm > 


But 


whence 


('-O' 


< € 


VI (n ~ m)X2 
” X ~ X-—-- 


„y- 

2n/ 


< e 


7n{m — 1) 
2n , 


•,/" 2"' 
m! 


mX (n — m)\- in (m — 1) 


On the other hand, 



X \ 


=0+' 

H— 


1 > <■ 

- X 

n 

- xj 




> 

^ w —1 m(in ~ 

- 1) 

+ 

711 

>e 2(n- 

w). 

n 

— mj 

r 



Hence 




^ , rnX {v — iu)\- m(m—l) (r? —w)X* 

“-X-) 


n 2n2 


2n • e ^(n — X)® 2n(n —w), 


and a fortiori 




> e 


- x+ 


mX (n — m)\'^ rn(m —1) 


2n2 2n 


(n — 


2n(n 


— m) 


If X and m are both small in comparison to n the above-introduced factor Q will be 
near 1. Under such circumstances we may be entitled to use an approximate formula 
due to Poisson 
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The preceding elementary analysis gives means to estimate the error incurred by using 
this formula. 

11. Apply the preceding considerations to the case n = 1,000, p = Hoo? ^ - 10 
and m = 10. Ans. 0.1256 < Tio < 0.1258. Poisson’s formula gives 0.1251—a 
very good approximation. Alo, 0.5807 < Pio < 0.5863. Taking /^o = 0.583, the 
error in absolute value will be less than 3.3 • 10“^. By a more elaborate method it is 
found Pio = 0.5830. 
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CHAPTER VIII 


FURTHER CONSIDERATIONS ON GAMES OF CHANCE 

1. When a person undertakes to play a very large number of games 
under theoretically identical conditions, the inference to be drawn from 
Bernoulli’s th(u)rem is that that person will almost certainly be ruined 
if the mathemati(*al expectation of his gain in a single game is negative. 
In case of a positive^ expectation, on the other hand, he is very likely to 
win as larger a sum as he likes in a sufficiently long series of games. 
Finally, in an equitable game when the mathematical expectation of a 
gain is zero, the only inference to be drawn from Bernoulli’s theorem is 
that his gain or loss will likely be small in comparison with the number of 
games played. 

These conclusions are appropriate however, only if it is possible to 
continue the series of games indefinitely, with an agreement to postpone 
the final settling of accounts until the end of the series. But if the 
settlement, as in ordinary gambling, is made at the end of each game, 
it may happen that even playing a profitable game one will lose all his 
money and will have to discontinue playing long before the number of 
games becomes large enough to enable him to realize the advantages 
which continuation of the games would bring to him. 

A whole series of new problems arises in this connection, known as 
problems on the duration of play or ruin of gamblers. Since the science 
of probability had its huml)le origin in computing chances of players in 
different games, the important question of the ruin of gamblers was 
discussed at a very early stage in the historical development of the 
theory of probability. The simplest problem of this kind was solved by 
Huygens, who in this field had such great successors as de Moivre, 
Lagrange, and Laplace. 

2. It is natural to attack the problem first in its simplest aspect, and 
then to proceed to more involved and difficult questions. 

Problem 1. Two players A and B play a series of games, the proba¬ 
bility of winning a single game being p for A and q for JS, and each game 
ends with a loss for one of them. If the loser after each game gives his 
adversary an amount representing a unit of money and the fortunes of 
A and B are measured by the whole numbers a and b, what is the proba¬ 
bility that A (or B) will be ruined if no limit is set for the number of 
games? 


139 



140 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. VIII 


Solution. It is necessary first to show how we can attach a definite 
numerical value to the probability of the ruin of A if no limit is set for 
the number of games. As in many similar cases (see, for instance, Prob. 
15, page 41) we start by supposing that a limit is set. Let n be this 
limit. There is only a finite number of mutually exclusive ways in which 
A can be ruined in n games; either he can be ruined just after the first 
game, or just after the second, and so on. Denoting by pi, ps, . . • Pn 
the probabilities for A to be ruined just after the first, second, . . . nth 
game, the probability of his ruin before or at the nth game is 

Pi + P2 + ‘ + pn. 

Now, this sum being a probability, must remain <1 whatever n is. 
On the other hand, each term of this sum is ^0 for the same reason. 
Both remarks combined, show that the series 

Pi + P2 + Ps + • • • 

is convergent. We take its sum as the probability for A to be ruined 
when nothing limits the number of games played. So it is clear that 
tliis probability, although unknown, possesses a perfectly determined 
numeri(^al value. Let us denote by px the probability for A to be ruined 
when his fortune is x. The probability we seek is i/a. Obviously, 

(1) 2/0 = L 

for A is certainly ruined if he has no money left. Similarly 

(2) Va+h = 0 

because if the fortune of /I is a + 5, it means that B has no money where¬ 
with to play, and certainly the ruin of A is then impossible. Further, 
considering the result of the game immediately following the situation 
in which the fortune of A amounted, to x it is possible to establish an 
equation in finite differences which yx satisfies. For, if A wins this game 
(the probability of which case is p), his fortune becomes a? + 1 and the 
probability of being ruined later is t/x+i. By the theorem of compound 
probability, the probability of this case is py^c+i- But if A loses (the 
probability of which is q), his fortune becomes x — 1 and the probability 
that the one possessing this fortune will be ruined is The proba¬ 

bility of this case is qyx-\- Now, applying the theorem of total proba¬ 
bility, we arrive at the equation 

(3) yx = vyx+i + Q'2/x-i. 

This equation has a particular solution of the form a® where a is a 
root of the equation 

a = pa^ + q. 
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li p 7 ^ q there are two roots 


1,^ 

V 

and, correspondingly, there are two distinct particular solutions of 
equation (3): 


1 and (-- 


Obviously, 


Vx — C -]r D\ 


is also a solution of (3) for arbitrary C and D. Now, we (;an dispose of 
C and D so as to satisfy conditions (1) and (2). To this end we have the 
equations 

C + D = 1 

pa+bc -|_ = 0^ 

whence 

r/fi+b 


C = 


^a4-b _ p(i-^h^ 


D = - 


pO’~\' b 

q^o.-¥b _ pa^h^ 


and 


Vx 


^a-j-bpx _ ^q^ 


px(^qa+b _ 

It remains to take x — a to obtain the recpiired probability 


_ - p'’) _ - q>‘) 

qa+h 


pO'-\-h pO,-{-b _ ^ 


that the player A possessing the fortune a will be ruined. Similarly, 
the probability of the ruin of B is 


Zb 


_ //(p" — g“) 


pa+b _ qi 

Va + Zb = 1, 


It turns out that 


rO-h6 


so that the probability that the series of games will continue indefinitely 
without A or B being ruined, is 0. The probability 0 does not show the 
impossibility of an eternal game, because this number was obtained, 
not by direct enumeration of cases, but by passage to the limit. Theo¬ 
retically, an eternal game is not excluded. Actually, of course, this 
possibility can be disregarded. 
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If p = g = so that each single game is equitable, the preceding 
solution must be modified. In this case, the above quadratic equation 
in a has two coincident roots = 1, and we have only one particular 
solution of (3), yx — 1- But another particular solution in this case is 
X, so that we can assume 

2/x = C + Dx 

and determine C and D from the equations 

C = 1; C + DCa + 6) = 0. 

Thus, we find that 

2 ,^ = 1 _ _ 

and for X = a 

h 

Similarly, giving Zh the same meaning as above, 

a 

Zi) I 7* 

a + 0 

If, therefore, each single game is equitable, the probabilities of ruin are 
inversely proportional to the fortunes of the players. The practical 
conclusion to be derived from this theoretical result is sheer common 
sense: It is unwise to play indefinitely with an adversary whose fortune 
is very large without submitting oneself to the great risk of losing all 
one^s money in the course of the games, even if each single game is 
equitable. Gamblers who gamble at an even game with any willing 
individual are in the same condition as if they were gambling with an 
infinitely rich adversary. Their ruin in the long run is practically 
certain. 

If single games of the series are not equitable, that is, p 9 ^ q the 
conclusion may be different. Supposing p > qy we have a case when 
the expectation of A is positive; in each single game, A has an advantage 
over his adversary. The above expression for ya may be written in the 
form 



and, because q/p < 1, it is easy to see that ya remains always less than 
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and converges to this number when h becomes infinite. Thus, playing a 
series of advantageous games even against an infinitely rich adversary, 
the proVjability of escaping ruin is 



If a is large enough, this can be made as near 1 as we please, so that a 
player with a large fortune has good reason to believe that in the course 
of the games he will never be ruined, but that actually he is very likely 
to win a large sum of money. 

This conclusion again is confirmed by experience. Big gambling 
institutions, like the Casino at Monte Carlo, always reserve certain 
advantages to themselves, and, although they are willing to play with 
practically everybody (as if they played against an infinitely rich adver¬ 
sary) the chance of their being ruined is slight because of the large 
capital in their possession. 

3. In the problem solved above the stakes of both players were 
supposed to be equal, and we took them as units to measure the fortunes 
of both players. Next it would be interesting to investigate the case in 
which the stakes of A and B are unequal. An exact solution of this 
modified problem, since it depends on a difference equation of Irgher 
order, would be too complicated to be of practical use. It is therefore 
extremely interesting that, following an ingenious method developed by 
A. A. Markoff, one can establish simple inequalities for the required 
probabilities which give a good approximation if the fortunes of the 
players are large in comparison with their stakes. 

Problem 2. If the conditions presupposc^d in Prob. 1 are modified, 
in that the stakes of A and B measured in a convenient unit are a and 
and their respective fortunes are a and 6, find the probabilities for A or 
B to be ruined in the sense that at a certain stage the capital of A will 
become less than a or that of B less than 

Solution. Let yx be the probability for A to be forced out of the 
game by the lack of sufficient money to set a full stake a when his 
fortune amounts to x and consequently that of his adversary is a + 6 — x. 
In the same way as before, we find that yx is a solution of the equation 
in finite differences: 

(4) yx == pyx+fi + qyx-^. 

To determine yx completely, in addition to (4), we have two sets of 
supplementary conditions: 

2/0 = 2/1 = * ‘ * = Va-i = 1 

2/a-M) = 2/a+6-l = ' • • = 2/a+6-.(/9-l) = 0. 


(5) 

( 6 ) 
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Equation (5) expresses the fact that if the fortune of A becomes less 
than his stake, it is certain that A must quit. On the contrary, equation 

(6) indicates the impossibility for A to be ruined if the other player B 
does not have enough money to continue gaming. Equation (4) is an 
ordinary equation in finite differences of the order a + /?. It has par¬ 
ticular solutions of the form ^ where 0 is a root of the equation 

(7) ~ 0^ + q ^ 

The left-hand member for ^ = 0 is positive and with increasing 0 de¬ 
creases and attains a minimum when 


p0^ = 


a 


and then steadily increases and assumes positive values for large 0. 
This minimum must be negative or zero because ^ = 1 is a root of (7). 
Now, if it is negative, there are two positive roots of (7). One of them 
is 0 = 1 and another > or < 1 according as 


or else 


P < 


a 

a + 13 


or 


V > 


a 


pP — qa < 0 or >0. 


That is, the positive root of (7) different from 1 is >1 when single games 
are favorable to B and <1 if they are favorable to A, In case of equita¬ 
ble games, both positive roots coimnde and ^ = 1 is a double root of (7). 
All the other roots of (7) are negative or imaginary. 

The regular way to solve the problem would be to write down the 
general solution of (4) involving a + arbitrary constants to be deter¬ 
mined by conditions (5) and (6). As this method would lead to a com¬ 
plicated expression for t/x, we shall refrain from seeking the exact solution 
of our problem, and instead, following A. A. Markoff's ingenious remark, 
we shall establish simple lower and upper limits for which are close 
enough if the fortunes of the players are large in comparison with their 
stakes. 

Lemma. 7/ yx is a solution of equation (4) ajid none of the numbers 


Vo, 2/1, ‘ . 2/a-i 

ya-\^y ya+b—h - • • ya+h-~0-\-} 

is negative, then i/a? ^ 0 for x = 0, 1, 2, . . . a + 6. 

Proof. Let u^^^ (/c = 0, 1, 2, ... a — 1) represent the probability 
that the player A whose actual fortune is x (and that of his adversary 
a + 6 — a;) will be forced to quit when his fortune becomes exactly == k. 
Evidently is a solution of equation (4) satisfying the conditions ' 
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74 *^ = 0 for a: = 0, 1, ... fc — 1, A; + 1, ... a — 1; a + 5, 
a + 5 - 1, . . . a + 6 - + 1; - 1. 

Similarly, if = 0, 1, 2, . . . /? — 1) represents the probability that 
the player B will be forced to quit when th(^ fortune of A bec^omes exactly 
= a + 6 — Z, will be a solution of (4) satisfying the conditums 

— 0 for x = (), 1, 2, . , . ot — . . . CL ^ — Z-hl, 

a + 6- Z-l, ...a + 6- /S? + l; = 1- 

Thus we get a + (^ particular solutions of (4), and it is almost evident 
that these solutions are independcait. Moreover, since they represent 
probabilities, ^ 0, ^ 0 for x = 0, 1, 2, . . . a + 6. Now, any 

solution yx of (4) with given values of 

Z/o, 7/1, .. . Va-l 

Uo+hy Va+b—ly • • • /S+l 

can be represented thus 

a-l 0-1 

Vx = + '^ya+b-ivi‘\ 

fc =0 7=0 

Hence, t/x ^ 0 for x = 0, 1, 2, . . . a + Z> if none of the numbers 

yoy 2 / 1 , .. . ya-i 
Z/o+7>» yo+b—ly • • • ya+h— 0+1 

is negative. This interesting property of the solutions of equation (4) 
derived almost intuitively from the consideration of probabilities can be 
established directly. (See Prob. 9, page 160.) 

The lemma just proved yields almost immediately the following 
proposition: If for any two solutions and y'' of equation (4) the 
inequality 

y" ^ yl 

holds for 

X == 0, 1, 2, . . . OL — 1; Cl T 6, a + 6 — 1, . . . a + 6 — -j- 1, 

the same inequality will be true for all x = 0, 1, 2, ... a + 5. It 
suffices to notic^e that yx = y'J — Z/i is a solution of the linear ecjuation 
(4) and, by hypothesis, t/x ^ 0 for x = 0, 1, 2, . . . a — 1; a + h, 
a + 6-1, ...a + Z)-^ + l. 

Now we can come back to our problem. First, if the mathematical 
expectation of A 


pfi — qa 
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is different from 0, equation (7) has two positive roots: 1 and 6, With 
arbitrary constants C and D 

is a solution of (4). Whatever C and D may be, as a function of x 
varies monotonically. Therefore, if C and D are determined by the 
conditions 

2/o “ ~ ^ 

we shall have 

2/' ^ 1 if X = 0, 1, 2, ... a -- 1 

2 /' go if x = a + h — 13 + 1, .,.a + b 

and by the above established lemma, taking into account conditions (5) 
and (6), we shall have for the required probability the following inequality 

Vx ^ 2/x; 

or, substituting the explicit expression for y'^, 

0a+b-fi + l _ 0z 
Vx — ^a+6--/3+l __ I ’ 


If, on the contrary, C and D are determined by 

ya—l — 1 , ya+b ~ 0 

we shall have 

2/' ^ 1 if X = 0, 1, 2, ... a - 1 

2/'^0 if x = a + fe — /3 + 1, ...a + 6 


and 


Vx 




0a-{-b—a-hl _ 0x—a-{-l 

^4-6-a4-l _ I 


Finally, taking x = a, we obtain the following limits for the initial 
probability 2/a: 


^ ~ 1 
^0a+b-fi+l __ 1 


- 1 

< Vn < __ _ 

— - Qa+b~ae 


0 a+b~a+l _ 2 


They give a sufficient approximation to 2/a if « and b are large com¬ 
pared with a and 

If each single game is equitable, equation (4) has a solution with two 
arbitrary constants: 

2 /i = C + Dx. 

Proceeding in the same way as before, we obtain the inequalities 

b — + 1 ^ ^ b 

c + 6- /3 + l-^“ = o + 6- a+l‘ 
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4. To simplify the analysis, it was supposed that nothing limited the 
number of games played by A and B so that an eternal game, although 
extremely improbable, was theoretically possible. We now turn to 
problems in which the number of games is limited. 

Problem 3. Players A and B agree to play not more than n games. 
The probabilities of winning a single game are p and q, respectively, and 
the stakes are equal. Taking thcvse stakes as monetary units, the fortune 
of A is measured by the whole number a and that of B is infinite or at 
least so large that he cannot be ruined in n games. What is the proba¬ 
bility for A to be ruined in the course of n games? 

Solution. Let yx.t represent the probability for ^4 to be ruined when 
his fortune is measured by the number x and he cannot play more than 
t games. The reasoning we have used several times shows that yx,t 
satisfies a partial equation in finite differences: 

( 8 ) yx,t = pyxi-ij-i + qyx-x,t-i. 

Moreover, if A has no money left, his ruin is certain, which gives the 
condition 

(9) yo.t = 1 if ^ ^ 0 . 

On the other hand, if A still j)ossesses money and cannot play any more, 
his ruin is impossible, so that 

(10) yz,(} = 0 if a: > 0 . 

Conditions (9) and (10) together with equation ( 8 ) determine yx,t 
completely for all positive values of x and t. To find an explicit expres¬ 
sion for yx,t we shall use Lagrange^s method. Equation ( 8 ) has particular 
solutions of the form 

where a and ^ satisfy the relation 

aP — pa- + q. 

We can solve this equation either for 13 or for a which leads to two different 
expressions of Solving for 13 we have infinitely many particular 

solutions 

a^(pa + qa~^y 

with an arbitrary a and we can seek to obtain the required solution in the 
form 


y x,i 



a At qot ^yf{(x)da 
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where f{a) is supposed to be developable in Laurent^s series on a certain 
circle c. To satisfy (10) we must have 



which shows that f{a) is regular within the circle c. To determine/(a) 
completely, we must have, according to (9) 

^ [ (pa + =1 for i = 0, 1, 2, . . . . 

^irljc oi 

All these equations are equivalent to a single equation 


1 r_ f(oi)dct _^ _1 

^TTlJcOC — pea^ ~~ 1 — e 

holding good for all sufficiently small e. The integrand has a single pole 
do within c defined by 

ao — p^ocl — ge = 0, 
and the corresponding residue is 


But this must be equal to 


q - pao'' 


1 


1 - € 


or, substituting for € its expression in ao 

<I + po^o 
pal — ao + q 

and hence for all sufficiently small ao 


that is, if 


/(«o) = 


Q - v<4 . 

pal — ao + q' 


fM 


Q - 

pa^ — a q 


all the requirements are satisfied. Taking into account that p + q ~ 1, 
we have 


and also 


fM = 


1 pa 

- _| -, 

1 — a q — pa 


/(«) = 1 + 


00 

2[ 


i+i2 


<?/ J 
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The expression for yx,t is tiierefore 

y... - 

n = 0 

where Co = 1 and Cn = 1 + (p/g)” if n ^ 1. 

It remains to find tlie coefficient of 1/a in the development of the 
integrand in a series of descending powers of a. Since 

t 

a.^~^{jpa + qoT^y = 

/ = () 

this coefficient is given by the sum 

t-x 

2 

1=0 

extended over all integers I from 0 up to th(' greatest integer not exceeding 

i — X 

• Hence, the final expnission for the ixrobability ya,n is 

n —a 

~'Y~ 

(11) ya.n = q"'^Clipqy\p’‘~<‘-^^ + 

1 = 0 

with the agreement, in case of an even n — a, to replace the sum 

po _|_ qo 

corresponding to I = — by 1. It is natural that the right-hand 

member of the preceding expression should be replaced by 0 if n < a, 
which is in perfect agreement with the fact that A cannot be ruined in less 
than a games. 

The second form of solution is obtained if we express a as a function of 
The equation 

pa^ — a/3 + <7 = 0 
having two roots, we shall take for a the root 

P — \/— ^pq 

Ot ■"** ’ * 

2p 

determined by the condition that it vanishes for infinitely large positive 
P and can be developed in power series of I//? when \p\ > 2\/^- Using 





150 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. VIII 
a in this perfectly determined sense, it is easy to verify that 


y x,t 



— - 4pg Y 

2 p / ^ - 1 




where c is a circle of radius > 1 described from 0 as its center, satisfies all 
the requirements. For it is a solution of equation ( 8 ). Next, for a: = 0 
and t ^ 0 , 


+ ■ V - ‘ 

and, finally, for i = 0 and a: > 0 


2 /*.o 


_L ^ - ^pq Y 

2WJA 2p / ^ - 1 


because the development of the integrand into power series of l/jS 
starts at least with the second power of 1//3. 

To find yx,t in explicit form, it remains to find the coefficient of 1//3 
in the development of 


( 

V 2p ) ^-I 


in a series of descending powers of jd. Let 




r= 


multiplying this series by 


& - 1 


= + ^^-2 + 


+ — + 


4 . 1 JL 4 - 

f ^ -r ^2 t- 


we find that the coefficient of \/fi in the product is 


and hence 


lx + L+i + ’ * 


yx,t = lx + Ix+i + 

provided t ^ x, for otherwise yx,t == 0 . 
can be written in the form 


• + It, 

* * * -{'It 

The quadratic equation in a 


a 


^(q + pa^) 


and the development of any power of its root vanishing for /? == 00 into 
power series of 1//3 can be obtained by application of Lagrange^s series. 
We have 
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but 


ir^ 

n!L 


(9 + 






if n = a; + 2i, and = 0 if n = a; + 2z + 1 . Hence, 


, x(a: + 2i — 1)! .. . 

lx-{2i+^ = 0 , 


and finally 


( 12 ) y... - ^[ 1 + !„ + + 2<!iW!3+«(p,). + 

I , a(a -f- it -1- 1) * • • (u “h 2h — 1) 

-I- ... -I 

where h = ^ or A; = --- - according as n and a are of the 

same parity or not. 

6. The difference ya,n — ya,n-\ gives the probability for the player A 
to be ruined at exactly the nth game and not before. Now, this differ¬ 
ence is 0 if n differs from a by an odd number, so that the probability of 
ruin at the (a + 2i — l)st game is 0. That is almost evident because 
after every game the fortune of A is increased or diminished by 1 and 
therefore can be reduced to 0 only if the number of games played is of 
the same parity as a. If n = a + 2z, the difference ya,n ya,n~i is 



dici -j- -j- 1) * * 


(a -f- 2z — 1) 

• • i 


qoHpi-^ 


Such, therefore, is the probability for A to be ruined at exactly the 
(a + 2i)th game. The remarkable simplicity of this expression obtained 
by means which are not quite elementary leads to a suspicion that it 
might also be obtained in a simple way. And, indeed, there is a simple 
way to arrive at this expression and thus to have a third, elementary, 
solution of Prob. 3 . 

Considering the possible results of a series of a + 2i games, let A 
stand for a game won by A, and B for a game lost by A. The result of 
every series will thus be represented by a succession of letters A and B, 
We are interested in finding all the sequences which ruin A at exactly 
the last game. Because the fortune of A sinks from a to 0 there must be 
i letters A and t + a letters B in every sequence we consider. Besides, 
there is another important condition. Let us imagine that the sequence 
is divided into two arbitrary parts, one containing the first letter and 
another the last letter of the sequence. Let x be the number of letters By 
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and y that of letters A in the second or right part of the sequence. There 
will be a + f ~ :r letters B and i — y letters A in the first or left part. 
It means that the fortune of A after a game corresponding to the last 
letter in the left part, becomes 

aJ^i — y — (a-\-i — x) = X — y 

and since A cannot be ruined before the (a + 2?')th game, x must always 
be >y. That is, counting letters A and B from the right end of the 
sequence, the number of letters B must surpass the number of letters A 
at every stage. Conversely, if this condition is satisfied the su(;cession 
represents a series of games resulting in the ruin of A at the end of the 
series and not before. 

To find directly the number of sequences satisfying this requirement 
is not so easy, and it is much easier, following an ingenious method 
proposed by D, Andr6, to find the number of all the remaining sequences 
of i letters A and t + a letters B. These can be divided into two classes: 
those ending with A and those ending with B. Now, it is easy to show 
that there exists a one-to-one correspondence between successions of these 
two classes, so that both classes contain the same number of sequences. 
For, in a sequence of the second class (ending with B) starting from 
the right end, we necessarily find a shortest group of letters containing 
A and B in equal numbers. This group must end with A. Writing 
letters of this group in reverse order without changing the preceding 
letters, we obtain a sequence of the first class ending with A, Con¬ 
versely, in a sequence of the first class there is a shortest group at the 
right end ending with B and containing an equal number of letters A and 
B. Writing letters of this group in reverse order, we obtain a sequence 
of the second class. 

An example will illustrate the described manner of establishing the 
one-to-one correspondence between sequences of the first and of the 
second class. Consider a sequence of the first kind 

B\BBABAA, 

The vertical bar separates the shortest group from the right containing 
letters A and B in equal numbers. Reversing the order of letters in this 
group, we obtain a sequence of the second class 

B\AABABB 

and this sequence, by application of the above rule, is transformed again 
into the original sequence of the first class. The number of sequences 
of the first class can now be easily found. It is the same as the number of 
aU possible sequences oi i — 1 letters A and a -f t letters R, that is, 

(a 2t — 1)! __ (a 4“ i H” l)(a -f- i d" 2) * * ' (a -f* 2z — 1) 

(i — 1) !(a + f)! ” 1 • 2 • • • (i — 1) 
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The total number of sequences in both classes is 

+ 2 + l)(a + 7 -f* 2) • * • (a + 2z — 1) 

1) 

Hence, the number of sequences leading to ruin of A in exactly a + 2i 
games is 

(a -j" i -|- l)(ct “h f “f" 2) * ' • (u “h 2i) 

^ rTx ~ 

cyi^ ”1" "b 1)(<^ + + 2) * * • (a + 2f — 1) __ 

..1 

_ a(a + f + 1) • • • (a + 2?' -- 1) 

As the probability of gains and losses indicated by every such sequence 
is the same, namely, the probability of the ruin of A in exactly 

a + 2i games is 

^ + 1) j • • (« + 2?; - 1) 

1 • 2 • 3 • • • i ^ ^ 

and hence the second expression found for ya,n follows immediately. 

The problem concerning the probability of ruin in the course of a 
prescribed number of games for a player playing against an infinitely 
rich adversary was first considered by de Moivre, who gave both the 
preceding solutions without proof; it was later solved completely by 
Lagrange and Laplace. The elementary treatment can be found in 
Bertrand^s ^^Calcul devs probability's/^ 

6. Formulas (11) and (12), though elegant and useful when n is not 
large, become impracticable when n is somewhat large, and that is pre¬ 
cisely the most interesting case. Since the question of the risk of ruin 
incurred in playing equitable games possesses special interest, it would not 
be out of place at least to indicate here, though without proof, a con¬ 
venient approximate expression for the probability ya,n in case of a large 
n and p = q = }4- Let t be defined by 


V2{n + iy 

then for n ^ 50 it is possible to establish the approximate formula 

ya.n = 1-4= + / 

Vt-Jo on 


where — 1 < ^ < 1. Suppose, for instance, that the fortune of a player 
amounts to $100, each stake being $1, and he decides to play 1,000, 
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5,000, 10,000, 100,000, 1,000,000 games. Corresponding to these cases, 
we find 

t = 2.2354, 0.9999, 0.7071, 0.2236, 0.0707 

and hence 

e-^^dz - 0.9984, 0.8427, 0.6827, 0.2482, 0.0796. 

The corresponding approximate values of 2/ioo,n are 

0.0016, 0.1573, 0.3173, 0.7518, 0.9204. 

Thus, for a player possessing $100 there is very little risk of being ruined 
in the course of 1,000 games even if he stakes $1 at each game. The risk 
is considerably larger, but still fairly small, when 5,000 games are played. 
In 10,000 games we can bet 2 to 1 that the player will still be able to 
continue. But when the limit set for the number of games becomes 
100,000, we can bet 3 to 1 that the player will be ruined somewhere in the 
course of those 100,000 games. Finally, there is little chance to escape 
ruin in a series of 1,000,000 games. The risk of ruin naturally increases 
with the number of games, but not so fast as might appear at first sight. 

7. We conclude this chapter by solving the following problem, 
where the fortunes of both players are finite. 

Problem 4. Players A and B agree to play not more than n games, 
the probabilities of winning a single game being p and q, respectively. 
Assuming that the fortunes of A and B amount to a and b single stakes 
which are equal for both, find the probability for A to be ruined in the 
course of n games. 

Solution. Let Zx,t be the probability for the player A to be ruined 
when his fortune is x (and that of his adversary a b — x) and he can 
play only t games. Evidently Zx,t satisfies the equation 

(13) = pzx+u-i + qzx-ut-i 

perfectly similar to equation (8), but the complementary conditions 
serving to determine Zx,t completely are different. First we have 

(14) Zo,t = 1 for < ^ 0. 

Next, 

(15) Za+b,t = 0 for < ^ 0, 

because if A gets all the money from B, the games stop and A cannot hi-. 
ruined. Finally, 

(16) 



Zx.o = 0 for X = 1 , 2, 3, ... a + 6 — 1, 
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because A, having money left at the end of play, naturally cannot be 
ruined. 

Since (13) has two series of particular solutions 

and 

where a. and a are roots of the equation 

pa^ — )3a + g = 0 

both developable into series of descending powers of for \fi\ > 1, we 
shall seek Zx,t in the form 

Here the integration is made along a circle of sufficiently large radius and 
f{3) and (p{0) are two unknown functions whii'li can be developed into 
series of descending powers of (i. Obviously Zx,t satisfies (13) identically 
in X and t. For rc == 0 and i ^ 0 we have the condition 


= 1 ; < = 0 , 1 , 2 , . . . 

which is satisfied if 


(17) 


m) + m 


Condition (15) will be satisfied if 

(18) + a'«+V(0) = 0 


and it remains to show that at the same time (16) is satisfied. 
(17) and (18), we have 


and 


m = 


1 


<pW) = 


,a-t-6 ^ _ 1 

1 


^'a+b _ ^a-f6 ^ 


1 


Solving 


(19) fW)a^ + ^(/3)a'* 


q,'o+-6^x _ ^ 

\p) (0 ^ 1)(q:'“'^^ — a“+^)* 


Now let a be the root vanishing for jS = oo and a' the other root whose 
development in series of descending powers of starts with the term 
containing /3. Evidently the development of (19) for 

X = 1, 2, 3, . . . a + 6 — 1 
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does not contain terms involving the first power of 1//3, and hence 
2*,o = 0 if X = 1, 2, 3, . . . a + h — \ as it should be. The solution 
of (13) satisfying (14), (15), (16) being unique, its analytical expression is 
therefore 







whence for x = a and t = n 


2a, n 





To find an explicit expression for Za,n it remains to find the coefiicient of 
1//3 in the development of 

P == 

\p/ — 1 


in series of descending powers of This can be done in two different 
ways. First we can substitute for a' its expression in a: 



and present P in the form 



or developing into series 


P = 



b 

^a+2h _|_ 



a+26 

I ^3a+46 . 



But the coefficient of 1/^ in 


^tn^n 
^ - 1 

by the second solution of Prob. 3 is the probability ym,n for a player with 
a fortune m to be ruined by an infinitely rich player in the course of n 
games. Hence, the final expression for Za,n is 

a+b 

yu+ib.n ~ (g ) y 3 a+ 46 .n + ' ’ ' , 


- 5h/a+2t,„+ ^ 


2?o,n — 2/a,n 
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the terms of this series being alternately of the form 


y (2k-\-l)a+2kb,n 


2/( 2 A:-4-1) a-f (‘2fc-f 2) 6, n 


for A; = 0, 1, 2, . . . . The series stops by itself as soon as the first 
subscript of yx.n becomes greater than n. 

To obtain a second expression of Za,n we notice that 


is a rational function of P whose denominator 


= Q R 


a — a 

is a polynomial in of the degree a + h — 1. To find the roots of /^ = 0, 
we set 13 = 2v^ ip. Since, then, 

a' = ^ a: = ^ 


we have 


The equation 


gV—* sin (a -h b )<f> 


sin (a + b)(p 


having roots 


^ /? = 1, 2, . . . a + 6 — 1, 


the a + 6 — 1 roots of R are 

= 2\/j^ cos iph. 

Now we can resolve the rational function P into a sum of simple elements 
as follows: 




A h 

^ - h 


q»(pf> — qb) 

jya+b _ qa \-b 


where 
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and for A > 0 


Ah = 


sin <ph 


(a + 5)(1 — 2'\/^ cos (ph) 


sin a<fh(cos <phy 


while E{0) is the integral part of P. The coefficient of 1//3 in the develop¬ 
ment of P being 


a + b-l 

Ao + Ah, 


we have a new explicit expression for Za,n: 


(20) = 


_ - q^) 


no+b — 




r 


aa + b— 1 


(2\/^y+^{qp~y ^ 

^ I 


Sin 


irh 

a h 


a + 6 


.1 1 - 2v^ c 


cos 


irh 

(x b 


sin 


rah 
a b 




This expression shows clearly that 2 a.with increasing n, approaches 
the limit 

^ - q^) 

pa+6 _ go+b 


representing the probability of ruin when the number of games is unlim¬ 
ited, in complete accord with the solution of Prob. 1. 

The first term in (20) naturally must be replaced by - ^ ^ in case 

p = q — This form of solution was given first by Lagrange. 


Problems for Solution 

1. Players A and B with fortunes of $50 and $100, respectively, agree to play until 

one of them is ruined. The probabilities of winning a single game are % and 
respectively, for A and B, and they stake $1 at each game. What is the probability 
of ruin for the player A? Ans. Very nearly 2“^® = 8.88-10“^®. 

2. If A and B at each single game stake $3 and $2, respectively, and have fortunes 
of $30 and $20 at the beginning, what is the approximate value of the probability 
that A will be ruined if the probability of his winning a single game is (a) p = 

(6) P = 


Ans. (a) 0.40 -f A; |a| < 1.7 X lO”*; (b) 0.96 + A; |a| < 4.6 X 10“». 

3. A player A with the fortune $a plays an unlimited number of games against an 
infinitely rich adversary with the probability p of winning a single game. He stakes 
$1 at each game, while his rich adversary risks staking such a sum ^ as to make the 
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game favorable to A. What is the probability that A will be ruined in the course 
of the games? Give numerical results if (a) a = 10, p = 3^, = 3; (6) a = 100, 
P ~ = 3. Arts. Let 6 < 1 be a positive root of the equation — 0 + ^=0. 

The required probability P is: P = 0®. 

In case (a) P = 0.002257; in case (h) P ~ 3.43 • 10"^^ 

4. A player A whose fortune is $10 agrees to play not more than 20 games against 

an infinitely rich adversary, both staking $1 with an e(}ual probability of winning a 
single game. What is the probability that A will not be ruined in the course of 
20 games? Ans. 0.9734. 

5. Players A and B with $1 and $2, respectively, agree to play not more than n 
equitable games, staking $1 at each game. What are the probabilities of their ruin? 


Ans. For A: 


3 + (-!)" . 1 


3 - (-1)" 
3.2®+i 


6. Players A and B with $2 and $3, respectively, play a series of equitable games, 
both staking $1 at each game. What arc the probabilities of their ruin in n games? 
Give the numerical result if n —20. Ans. 


r-(^) 




■BlV 




€ = 1 if n is odd, c = 2 if n is even. 


77 = 1 if n is even, r/ = 2 if n is odd. 


7, Find the expression of t/o.n, the probability of the ruin of A when his adversary 
B is infinitely rich, corresponding to formula (20). Ans. From the definition of a 
definite integral it follows that 


where 


(2\/pq)^ 


ya,n — Va.oo 


Jo 1 


sin <p sin a<p 

. .-(cos ip)^d(p 

— 2\/ pq cos <p 


i/a.« =1 if 


2/0 ,00 



p ^ q 

if V > q- 


If the games are equitable and n differs from a by an even number, then 


2 P2 sin aif 

ya,n = 1-I -(cos 

ttJo sm <p 

This formula was given by Laplace. 

8 . Referring to the last formula in the preceding problem, show that 

2/a,n = 

where 

f ^ 

V2(n + i)' " 27m n' 


_2 




‘du + A 


2 
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Indication of the Proof. It is important to proven tlie following inequalities first 


whence 


<p (cos 
sin <p 


(p{C0S 


> c 


for Q < ^p S - 


for 0 < ^ ^ 


ip (cos ipY^'^ 
sin ip 


n + l ,r 
2 1 - 


n + 1 

Q - 


0 < 0 <1 


provided 0 < ^ 7r/4. The rest of the proof is easy. 

9. Attempt a direct proof of the important lemma (page 144) used in the discus¬ 
sion of Prob. 2. 

Hint; The proof can be based upon the following proposition^ generalizing an 
important theorem on determinants due to Minkowski: Let 


fi = aixXi + a2iX2 -f • • • + antXn; i 2, Sf ... 71 

be a system of linear forms whose coefficients satisfy the following conditions; 

(1) an > 0; Oki ^ 0 if k 9^ i; ait -Y a^i A- • • * + am ^ 0. 

(2) One of these sums is positive. 

If these forms assume nonnegativc values, tlnm (W(‘ry Xt ^ ()('i — 1, 2, . , . n). 
Proof by induction: Express Xn through Xi, Xiy . . . j*,,-!, thus: 


_ /« — ainXl — a2n.r2 — • • • — an_l.r,X„_l 

a-tin 

and sul)stitutc into the remaining forms. Show that (h(‘ resulting forms in x\, a* 2 , 
. . . Xn-\ satisfy the same conditions (1) and (2). Hema', it remains to ])rove the 
proposition for two forms, which can easily be done 
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CHAPTER IX 


MATHEMATICAL EXPECTATION 

1. Bernoulli’s theorem, important thoiif^h it is, is but the first link 
in a chain of theorems of the same character, all contained in an extremely 
general proposition with which we shall deal in the next chapter. But 
before proceeding to this task, it is necessary to extend the definition of 
'^mathematical expectation”—an important concept originating in 
connection with games of chance. 

If, according to the conditions of the game, the player can win a 
sum a with probability p, and lose a sum h with probability q — 1 — p, 
the mathematical expectation of his gain is by definition 

pa — qh. 

Considering the loss as a negative gain, we may say that the gain of the 
player may have only two values, a and —h, with the corresponding 
probabilities p and g, so that the expectation of his gain is the sum of the 
products of two possible values of the gain by their probabilities. In this 
case, the gain appears as a variable quantity possessing two values. 

Variable quantities with a definite range of values each one of which, 
depending on chance, can be attaiiu'd with a definite i)robability, are 
called “chance variables,” or, using a Greek term, “stochastic” variables. 
They play an important part in the theory of probability. A stochastic 
variable is defined (a) if the set of its j)ossible values is given, and (6) if 
the probability to attain each particular value is also given. 

It is easy to give examples of stochastic variables. The gain in a 
game of chance is a stochastic variable with two values. The number of 
points on a die that is tossed, is a stochastic variable with six values, 
1, 2, ... 6, each of which has the same probability 3^^. A number on 
a ticket drawn from an urn containing 20 tickets numbered from 1 to 20, 
is a stochastic variable with 20 values, and the probability to attain 
any one of them is Each of two urns contains 2 white and 2 black 

balls. Simultaneously, one ball is transferred from the first urn into the 
second, while one ball from the latter is transferred into the first. After 
this exchange, the number of white balls in one of the urns may be regarded 
as a stochastic variable with three values, 1, 2, 3, whose corresponding 
probabilities are, respectively, j^, It is natural to extend the 

concept of mathematical expectation to stochastic variables in general. 

161 
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Suppose that a stochastic variable x possesses n values: 

^2) • • • ^Tif 

and 

Plf P2, . . . Pn 

denote the respective probabilities for x to assume values Xi, Xzy . , , Xn- 
By definition the mathematical expectation of x is 

E{X) = piXi + P2X2 + * * • + PnXn^ 

It is understood in this definition that the possible values of the 
variable x are numerically different. For instance, if the variable is a 
number of points on a die, its numerically different values are 1, 2, 3, 4, 5, 
6, each having the same probability, By definition, the mathematical 
expectation of the number of points on a die is 

1(1 +2+ 3+ 4+ 5+ 6) =3.5. 

If the variable is the number on a ticket drawn from an urn containing 
20 tickets numbered from 1 to 20, its numerically different values are 
represented by numbers from 1 to 20, and the probability of each of 
these values is so that the mathematical expectation of the number 
on a ticket is 

+ 2 + • • • + 20) = 10.5. 

2 . It is obvious that the computation of mathematical expectation 
requires only the knowledge of the numerically different values of the 
variables with their respective probabilities. But in some cases this 
computation is greatly simplified by extending the definition of mathe¬ 
matical expectation. Suppose that, corresponding to mutually exclusive 
and exhaustive cases Ai^ A 2 , . . . Amy the variable x assumes the values 
Xiy X 2 , . . . Xm, with the corresponding probabilities pi, P 2 , . . . Pm; 
we can define the mathematical expectation of x by 

E{X) = PiXi + P2X2 + • • • + PmXm. 

What distinguishes this extended definition from the original one is that 
in the second definition the values Xi^ X 2 j . . . Xm need not be numerically 
different; the only condition is that they are determined by mutually 
exclusive and exhaustive cases. 

To make this distinction clear, suppose that the variable x is the 
number of points on two dice. Numerically different values of this 
variable are 

2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 
and their respective probabilities 

Aj sPcj A» A» 
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Therefore, by original definition, the expectation of x is 

A + A + M M + n + IX + n + n + u + h = w = 7. 

But we can distinguish 36 exhaustive and mutually exclusive cases accord¬ 
ing to the number of points on (aich die and, (correspondingly, 36 values 
of the variable x, as shown in the following table: 


First die 

Second die 

X 

First die 

Second die 

X 

1 

1 

2 

4 

1 

5 

1 

2 

3 

4 

2 

6 

1 

3 

4 

4 

3 

7 

1 

4 

5 

4 

4 

8 

] 

5 

6 

4 

5 

9 

1 

6 

7 

4 

6 

10 

2 

1 

3 

5 

1 

6 

2 

2 

4 

5 

2 

7 

2 

3 

5 

5 

3 

8 

2 

4 

6 

5 

4 

9 

2 

5 

7 

5 

5 

10 

2 

6 

8 

5 

6 

11 

3 

1 

4 

6 

1 

7 

3 

2 

5 

6 

2 

8 

3 

3 

6 

6 

3 

9 

3 

4 

7 

6 

4 

10 

3 

5 

8 

6 

5 

11 

3 

6 

9 

6 

6 

12 


The probability of each of these 36 cases being by the extended 
definition the mathematical expectation of x is 


2 + 2-3+4-3 + 5-4 + 6-5 + 7-6 + 8-5 + 9-4 + 10-3 + 11-2+12 

36 


= 7 


as it should be. 

It is important to show that both definitions always give the same 
value for the mathematical expectation. 

Let Xif 0 ^ 2 , .. . Xjn be the values of the variable x corresponding 
to mutually exclusive and exhaustive cases Ai, A 2 , Am, and, 
Ply P 2 , . • • Pmf their respective probabilities. By the extended defini¬ 
tion of mathematical expectation,, we have 


( 1 ) 


E{X) = plXi + P2X2 + • • • + PmXm^ 
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The values a*], X 2 , . . . Xm are not necessarily numerically different, 
the numerically different values being 

• • • X- 

We can suppose that the notation is chosen in such a way that 

Xi, X 2 , . . . Xa are equal to 

- . • Xi, are equal to r?; 
x/Hi, Xh^ 2 y . . . Xc are equal to 

2 , - . . Xr,i are (Hpial to X. 

Hence, the right-hand member of ( 1 ) can be represented thus: 

iPl + P 2 + * * ‘ + (pa4l + Pa-f2 + ‘ ‘ ‘ + + * ‘ * + 

+ (/>/+! H" VW + * * * + Pm)X. 

But by the theorem of total probabilities, the sum 

Pi + 7>2 + • ' • + 7>« 

represents the probability P for the variable x to assume a determined 
value because this can happen in a mutually exclusive tvays; namely, 
when X — Xi, or a: = 0 ^ 2 , . . • or :r = Xa. By a similar argument we see 
that the sums 

Pa+l + Pa4-2 + * ’ ' + ^6 

Phi + PH2 + * * ' + Pr 


Phi + PH2 + ’ * ' + 

represent the probabilities Q, . . . T for the variable x to assume 
values 77 , f, ... X. Therefore, the right-hand member of (1) reduces 
to the sum 

+ Qr? + + • • • + rx 

which, by the original definition, is the mathematical expectation of x. 

If, corresponding to mutually exclusive and exhaustive cases, a 
variable x assumes the same value a —^in other words, remains constant— 
it is almOvSt evident that its mathematical expectation is a, because the 
sum of the probabilities of mutually exclusive and exhaustive cases is 1 . 
It is also evident that the expectation of ax where a is a constant, is 
equal to a times the expectation of x. 

Note: Very often the mathematical expectation of a stochastic variable is called 
its *^mean value. 

Mathematical Expectation of a Sum 
3. In many cases the computation of mathematical expectation is 
greatly facilitated by means of the following very general theorem: 
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^ Theorem. The mathematical expectation of the sum of several variables 
is equal to the sum of their expectations; or, in symbols, 

E{x + y+ Z+ • — + w) ^ E(x) + E(y) + Eiz) + • • • + E{w), 

Proof. We shall prove this theorem first in the ease of a sum of two 
variables. Let x assume numerically different values X\, x^, . . . Xm, 
while numerically different values of y are yi, i/o, . . . yn- In regard to 
the sum x y we can distinguish mn mutually exclusive cases; namely, 
when X assumes a definite value Xi and y another definite value yy, while i 
and j range respectively over numbers 1, 2, 3, . . . rn and 1, 2, 3, . . . n. 
If Pa denotes the probability of coexistence of th(' equalities 

a: = Xi, y = ijj 

we have by the extended definition of mathematical expectation 

m n 

E{x + y) = '^ + yi)> 

2 = 1 ;•= 1 

or 

m n m n 

(2) E(x + ?/) = sx Pi,Xi + XX 

7 = I y = 1 y = 1 y = 1 

As the variable x assunu's a definite value Xi in n mutually exclusive 
ways (namely, when tlu' vahu* Xi of x is accompanied by the values 
?/i, 7 / 2 , .. . yn of y) it is obvious that the sum 

Xp. 

y = i 

represents the probability pi of the ecpiality x = Xi. In a similar manner 
we see that the sum 

m 

Xph 

2 = 1 

represents the probability < 7 , of the equality y — T/y. Therefore 

m n m n m 

X ^ X*’ X^«"" X^''*’"" 

2 = 1 y = i 2 = 1 y=i 2 ‘ = i 


m n 


X XvaVi 

t=iy=i 


n m 


X XvaVi 

y=12 = 1 


Xm- = E{y)', 
; = 1 


and similarly 
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that is, by (2) 

E{x + ?/) = E{x) + E{y) 

which proves the theorem for the sum of two variables. 

If we deal with the sum of three variables x + 2 / + 2 , we may consider 
it at first as the sum oi x + y and z and, applying the foregoing result, 
we get 

E{x + y -i- z) = E{x + y) + E(z) ; 

and again, by substituting E(x) + E(y) for E(x + y), 

E{x + 7 / + 2 ) = E{x) + E(y) + Eiz). 

In a similar way we may proceed farther and prove the theorem for the 
sum of any number of variables. 

4. The theorem concerning mathematical expectation of sums, 
simple though it is, is of fundamental importance on account of its very 
general nature and will be used frequently. At present, we shall use it 
in the solution of a few selected problems. 

Problem ,1 . What is the mathematical expectation of the sum of 
points on n dice? 

Solution. Denoting by Xi the number of points on the fth die, the 
sum of the points on n dice will be 

s = xi + X 2 + • • • + 
and by the preceding theorem 

E{s) = E(x,) + E{X 2 ) + • • • + EiXr,). 

But for every single die 

E{xi) = f = 1, 2, . . . n; 

therefore 



Problem 2. What is the mathematical expectation of the number of 
successes in n trials with constant probability p? 

Solution. Suppose that we attach to every trial a variable which 
has the value 1 in case of a success and the value 0 in case of failure. If 
the variables attached to trials 1, 2, 3, . . . n are denoted by Xi, X 2 , . . . 
Xn, their sum 

m = Xi + X2 + • • * + Xn 

obviously gives the number of successes in n trials. Therefore, the 
required expectation is 

E(m) = E(xi) + E{x^ . »j_ E{Xn)- 
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But for every i = 1, 2, 3, . . . n 


E{x^) = p • 1 + (1 ~ p) • 0 = p, 


because Xi may have values 1 and 0 with the probabilities p and 1 — p 
which are the same as the probabilities of a success or a failure in the fth 
trial. Hence, 

E{m) = np 


or 


E(m ~ np) = 0, 


which may also be written in the form 


n 

Tm{m — np) = 0. 

m = 0 

This result was obtained on page 116 in a totally different*and more 
complicated way. The new deduction is preferable in that it is more 
elementary and can easily be extended to more complicated cases, as 
we shall see in the next problem. 

tEjahlem^ Suppose that we have a series of n trials independent or 
not, the probability of an event being p* in the zth trial when nothing is 
known about the results of other trials. What is the mathematical 
expectation of the number of successes min n trials? 

Solution. Again let us introduce the variable Xi connected with 
the fth trial in such a way that Xi = 1 when the trial results in a success 
and Xi = 0 when it results in failure. Obviously, 


and 

But 

and therefore 


m = Xi + X2 + • • • + Xn 


E(m) = E{xi) + E{x2) + • • • + E{xr), 

E{xi) = 1 • Pi + 0 • (1 - Pi) = Pi 


E{m) = Pi 4- P 2 + • • • + Pn. 


For instance, if we have 5 urns containing 1 white, 9 black; 2 white, 
8 black; 3 white, 7 black; 4 white, 6 black; 5 white, 5 black balls, and we 
draw one ball out of every urn, the mathematical expectation of the 
number of white balls taken will be: 


E{m) = iV + tV + VV + 1 % + 

l^oblem 4. An urn contains a white and b black balls, and c balls are 
drawn. What is the mathematical expectation of the number of the 
white balls drawn? 
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Solution. To every ball taken we attach a variable which has the 
value 1 if the extracted ball is white, and the value 0 otherwise. The 
number of white balls drawn will then be 


s = + X2 + • • * + a:c. 

But the probability that the tth ball removed will be white when nothing 

is known of the other balls is — ^r-ry tlierefore 

a + 6 


E{x:) = 


• 1 + 


a + h " ' a + 5 
for every and the required expectation is 


a + h 


E{s) = 


ca 

d h 


Problem 6. An urn contains n tickets numbered from 1 to n, and 
m tickets are drawn at a time. What is the mathematical expectation 
of the sum of numbers on the tickets drawn? 

Solution. Suppose that rn tickets drawn from the urn are disposed 
in a certain order, and a variable is attached to every ticket expressing 
its number. Denoting the variable attached to the tth ticket by Xi, 
the sum of the numbers on all m tickets apparently is 

S = Xi + X2 + ' ^ ‘ + Xn,. 


But when taken singly, the variable x^ may represent any of the numbers 
1, 2, 3, . . . n, the probability of its being equal to any one of these 
numbers being \/n. By the definition of mathematical expectation, we 
have 

1 + 2 + 3 + — • + n ^ n + 1 
n 2 ^ 

Eis) = 


E{xi) = 

and therefore 


For example, taking the French lottery where n = 90 and m = 5, we 
find for the mathematical expectation of the sum of numbers on all 5 
tickets 


E{s) = 


5 • 91 


= 227.5. 


Problem 6. An urn contains n tickets numbered from 1 to n. These 
tickets are drawn one by one, so that a certain number appears in the 
first place, another number in the second place, and so on. We ^hall say 
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that there is a coincidence^^ when the number on a ticket corresponds 
to the place it occupies. For instance, there is a coincidence when the 
first ticket has number 1 or the second ticket has number 2, etc. Find 
the mathematical expectation of the number of coincidences. Also, find 
the probability that there will be none, or one, or two, etc., coincidences. 

Solution. Let Xi denote a variable which has the value 1 if there is 
coincidence in the fth place, otherwise Xi = 0. The sum 

S = Xi + o-o + • • * + 

gives the total number of coincidences and 

E{s) == E{xi) + E{xt) + • • • + E{xn). 

But 



because the probability of drawing a ticket with the number i in the ith 
place without any regard to other tickets obviously is 1/n; therefore, 

E{s) = n • i = 1. 
n 

On the other hand, denoting the probability of exactly i coincidences by 
Pi, we have by definition 

E{s) = Pi + 2p2 + * • • + np„, 

and, comparing with the preceding result, we obtain 

(3) Pi + 2p2 + * * * + UPn = 1. 

Let us denote by <^(n) the probability that in drawing n tickets, we shall 
have no coincidences. It is easy to express p* by means of (p{n — i). 
In fact, we have exactly i coincidences in 

^ n(n - I) — • (n - i + 1) 

1-2 Z — i 

mutually exclusive cases; namely, when the tickets of one of the 

c; 

specified groups of i tickets have numbers corresponding to their places 
while the remaining n — i tickets do not present coincidences at all. 
By the theorem of compound probability, the probability of i coincidences 
in i specified places is 


1 1 . , ^ 1 
wn—1 n—i+1 
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and the probability of the absence of coincidences in the remaining n — i 
places is (p{n — i). The probability of exactly i coincidences in i specified 
places is therefore 

(p{n — i) 

n{n — 1 ) * • * (n — ^ + 1 )^ 

and the total probability pi of exactly i coincidences without specification 
of places is 


n(n — 1 ) * • * (n — ^ + 1 ) ^ ip{n — i) _ 

1 • 2 • 3 * • * f n(n — 1) • • • {n — i + l)^ 


or 

(4) 


_ _ <f>{n - i) 

1 • 2 • 3 • • z* 


The symbol <p(0) has no meaning, but the preceding formula holds 
good even for i = n if we assume </?(()) = 1 . 

Substituting expression (4) for into (3), we reach the relation 


- 1 ) + ' 5 ^ 2 ^ + + 


2! 


+ 


^( 0 ) _ ,. 
(n- 1 )! 


or changing n into n + 1 


»(«) + + 


1 ! 


2 ! 


+ ^ = 1 
^ n! 


which gives successively ip(l), <p( 2 ), ^(3), ... by taking 

w = 1, 2, 3, . . . . 

The general result, which can easily be verified, is 


V>(n) = 


' (-!)* 

I kl ’ 


*=o 


or, in an expUcit form. 


yW-i-j + lij 


1 


+ 


1 

Even for moderate n this is very near to 

1 1 


1 = 1 _ 1 _|- 

e 1^1-2 


1 


+ 


+ 


(-!)» 


1-2 3 


n 


ad inf. = 0.36787944. 


Mathematical Expectation of a Product 
6 . For the product of two or more stochastic variables we do not 
possess anything so general as the foregoing theorem concerning the 
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mathematical expectation of sums. An analogous theorem with respect 
to the product of stochastic variables can be established only under 
certain restrictive conditions. 

Several stochastic variables are called independent’^ if the proba¬ 
bility for any one of them to assume a determined value does not depend 
on the values assumed by the remaining variables. For instance, if the 
variables are the numbers of points on dice, they may be considered as 
independent. 

On the other hand, we have a case of dependent variables in numbers 
on tickets drawn in a lottery. For, in this case the fact that certain 
tickets have determined numbers precludes the possibility of any one of 
tiiese numbers appearing on other tickets drawn at the same time. 

If more than two variables are independent according to the above 
definition, it is clear that any two of them are independent. But the 
converse is not true: It is easy to imagine cases when any two of the 
variables are independent and yet they are not independent when taken 
in their totality. Tlierefore, when speaking of independence of variables, 
we must always specify whether they are independent in their totality 
or only in pairs. 

For two independent variables we have the following simple theorem: 
y^Theorem. The niathematical expectation of the product xy of two 
independent variables x and y is equal to the product of their expectations; 
oTj in symbols 

E{xy) = E{x)E{y), 

Proof. Let .ri, xz^ . , . Xm be the complete set of values for x, and 
yu 2 / 2 , •• • yn the analogous set for y. Denoting the probability of 
X being equal to Xi by p,:, and similarly, the probability of y being equal 
to yj by qjj the events 

X — Xi and y = yj 

are independent by definition of independence—because the probability 
of X being equal to Xi is not affected by the fact that y has assumed any 
one of its possible values, and it remains pi. 

By the theorem of compound probability the simultaneous occurrence 
of the events 

X = Xi and y = yj 

has the probability Again, by the extended definition of mathe¬ 

matical expectation 

m n 

E{xy) = 2) %V<9iXiyi 

i-ij-i 
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because the values of the product xy are determined by mn exhaustive 
and mutually exclusive cases 


X == Xi, y - y^ 

i = 1, 2, . . . m; j == 1, 2, . . . 7?. 

Now, performing the summation with respect ioj first, while i remains 
constant, we have 

n n 

= PiXi'^qiVi = PiXiEiy), 

;=i y-i 

and again 

m m 

E{xy) = '^PiXiEiy) = E{y)'^piXi, 

1=1 

or 

E{xy) = E(x)E(7j). 

This theorem can be extended to the case of several factors inde^ 
pendent in their totality. For instance, if Xy ?/, z are independent, it is 
obvious that xy and z are also independent. Hence 


and again 


E(xyz) = E(xy)E(z)y . 
E(xyz) ^ E{x)E(y)E(z). 


In a similar way we can extend this theorem to any number of inde¬ 
pendent factors. 

As an important application, let us consider two independent variables 
x and y with the respective expectations a and b. The variables x — a 
and y h being independent also, we have 


but 

therefore 

(5) 


E{x - a) iy - b) = E{x - a)E{y - b); 
E{x — a) — E{x) --a = a--a = 0; 


E{x — a){y — 6) = 0. 


Dispersion and Standard Deviation 
6 . Let X be a variable and a its mathematical expectation. The 
expectation of 

(x — a)2 

is called dispersion^' of the variable, and the square root of dispersion 
is usually called ‘^standard deviation." As 

(x — ay *= X* — 2ox + 



Sbc. 7] 


MATHEMATICAL EXPECTATION 


173 


we can apply the theorem on the expectation of sums to the right-hand 
member of this identity and find 

E{x - a)2 = E{x^) - 2aE{x) + = E(x^) - 

or, denoting by h the expectation of 
(6) E{x - a)2 = b - a\ 

Thus, the computation of dispersion can be reduced to the computa¬ 
tion of the expectation of the variable itself and its square. Also, denot¬ 
ing by (T the standard deviation of x, we have the formula 

0-2 = 6 — a^. 


For instance, if the variable is the number of points on a die, we have 

7 , 12 + 22 + • • • + 62 91 

a = 6 = -g-- -g 

and 


cr2 = ^ - 4^ = 2.917; tr = 1.708. 


Dispersion of Sums 

7. It is important to have a convenient formula to find the dispersion 
of a sum 


S = Xi + X2 + • • • + Xn 


of several stochastic variables. The expectation of s is given by 
E{s) = E(xi) + E(x2) + • • • + Eix„) 


or 


E{s) = ai + a2 + • • • + 

denoting by a* the expectation of x,. The deviation of s from its expecta¬ 
tion is, therefore, 


Xl + X2 + • • • + Xn — (ai + a2 + • • • + ttn), 


and we have to find the expectation of 

(a:i + X2 + • • • + Xn — ai — a2 — * * • - an)*. 


Now we have identically 

n 

(aJi + + • • • + a:» — ai - 02 - • • • - o»)® = - a<)* + 

+ - Oi)(a:y - o,), 

t.y 



174 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. IX 


the last sum being extended over all the diflferent combinations of sub¬ 
scripts i and j for which i 9 ^ j and consisting of n{n — l)/2 terms. 
The mathematical expectation of a sum being equal to the sum of the 
expectations of its terms, we must find the expectations of the terms 

(xi — ai)^ and (Xi — a»)(x/ — a/). 

The first is the dispersion of Xi and can be found from (6); namely, 

E(xi - aiY = bi - af = af 

if hi is the expectation of xf. 

As to 

Ei^Xi (i^{xj ^/)j 

instead of it we introduce the so-called “correlation coefficient^^ of Xi 
and Xj 

o _ E{xi - ai){xi - ai) 
tCi,j — • 

CTiO-/ 

Denoting the required dispersion by D, we obtain 

(7) D = orf + “h * • • + + 2i?i,2<ri(T2 + 2 Ki,^ai( 7 s + * * ' + 

" 4 “ 2Rn—l,n^n—l(^n 

so that the dispersion of a sum can be obtained as soon as we know the 
dispersibn of its terms and their correlation coefficients. 

In an important case, expression (7) for dispersion can be greatly 
simplified. If the variables Xi, X 2 j . . . Xn are independent in pairs^ we 
see from (5) that all the correlation coefficients are = 0, so that in this 
case simply 

(8) D = (7? + cri + • • • -f (t 2 = 61 ~ af + 62 - ai + • • * + ~ 

In other words, the dispersion of a sum of variables, any two of which 
are independent, is equal to the sum of dispersions of its terms. 

8 . A few examples will serve to illustrate the use of these formulas. 
Problem 7. Find the dispersion of the number of successes in series 
of.n independent trials with probabilities pi, p2, . • . Pn corresponding to 
first, second, . . . nth trial. 

Solution. As in Prob. 2 we associate with every trial a variable which 
assumes the value 1 or 0, according as the trial resulted in success or 
failure. These variables Xi, 0^2, .. . Xn are independent because the 
trials are supposed to be independent. The number of successes 

m Xi + X 2 + ^ + Xn 

is thus the sum of the independent variables. To find the dispersion of 
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any one of these variables Xi we notice that 

E{xi) = I ■ Tpi -{■ 0 ■ Qi = Pi 
E{x?) = 1 • Pi + 0 • = p<; 

therefore the dispersion of Xi is 

•^i = Pi - Pi = PiQi 

and by (8) 

D = E(m — Pi - P2 — • • * - Pny = Piqi.+ P2g2 + • * • + PnQn^ 

In the Bernoullian case of independent trials with the same probability 
p, we have pi = p2 = * * = Pn = p and 

E(m — np)^ — npq. 

This formula is equivalent to the relation 

n 

^ Tm{rn — npY — npq 

m = 0 

established on page 116. 

Problem.^8. In a lottery m tickets are drawn at a time out of n 
tickets numbered from 1 to n. Find the dispersion of the sum s of the 
numbers on the tickets drawn. 

Solution. Let Xi^ . . . Xm b(? the variables representing the 
numbers on the first, second, . . . mth ti(*kets. By Prob. 5 we know that 

\ n -Y I 
E{xi) = — 2 ; 

and in a similar way we find 

P + 22 + • • • + (n + l){2n + 1) 

E{x!) =- - - =-g-, 


whence the dispersion of x* is 






1 


Since we deal in the present case with dependent variables, we musft 
find the correlation coefficients, or, which is the same, 


4 - - - 4 -') 

for every pair of subscripts i and j. The variable x. may have any of 
the values 1, 2, 3, . . . n, with the same probability 1/n; and x, may 
have any of the same values with the exception of that assumed by Xi 



176 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. IX 


with the probability - so that the preceding expression consists of 
terms 

where Xj for given Xi = 1, 2, . . . n, ranges over all numbers 1, 2, 
3, . . . n with the exception of x*. As 

% - » 

t = i 

it is obvious that 

and 

4 - - - ”4-') = “ 

x%=^\ 

_ n + 1 
12 


Everything now is ready for the application of (7). All simplifications 
performed, we get the following expression of the required dispersion 


D = 


m{n- — 1 ) 
\2 


(‘ - ’^0 


If the variables were independent, the dispersion would be 

m(n2 — 1) 

12 ““' 

The dependence diminishes it, but the influence of dependence is not great 
if the ratio m/n is small. 


Problems for Solution 

• 1. Find the mathematical expectation M of the absolute value of the discrepancy 
m in a series of n independent trials with constant probability p. Ans, By 

definition 


M = 2 ) - 


npl 


m = 0 


T^ = 


n! 


m\{n —■ m)\ 


.pwign- 


where, as usual, 
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But since 


we have also 


n 

^ Tm{m — np) = 0, 
m=0 


M = 2 ^ Tm{7n —np)y 

m ^np 


the sum being extended over all integers m which are >np. Denoting by F(x, y) the 
sum 


we have 


y) = C"x"'y' 

m >np 


2 Tyn(m - np) = _ npF{p, q). 

dp 

m >np 

On the other hand, by Euler’s theorem on homogeneous functions 


dF OF 

nf (/>, g) = y/— + g—, 
dp dq 


whence 




m ^np 


Here p. represents an integer determined by 
The answer is therefore given by the simple formula 




2. By applying Stirling’s formula (Appendix 1, page 347) prove the following 
result: 

where 

/_J_ l_\ 

^ - l’ ng - 1/ 

and n is so large as to make c ^ Ho- 
Hint: 

^ '2npq\ d d' 1 / 1 1 \ 

TT / 2(np — d) 2{nq — d') 24 \np — d nq — d'/ 

2npg\ 1 _1_ d^ __ d'^ 

TT / ^ 12(717^ — d) 12(ng — d') 4t{np — d)^ 4i{nq — i?')* 


log 


log 


w 

(mJ 
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3. What is the expectation of the number of failures preceding the first success in 
an indefinite series of independent trials with the probability p? 

Ans. qp 2q^p + ^q^p + • • • = = -• 

(1 - qP p 

4. Balls are taken one by one out of an urn containing a white and h black balls 
untS^the first white ball is drawn. What is the expectation of the number of black 
balls preceding the first white ball? 

Ans. 1. By direct application of definition the following first expression for the 
required expectation M is obtained: 


a-\-h\a-\-h — \ 


2 __ + 

(a + 6 - l)(o + 6-2) 

3 _ 6(6 - 1)(6 - 2 ) _ 

(a + 6 - l)(a + 6 - 2) (o + 6 - 3) 


Ans. 2. However, it is possible to find a simpler expression for M. Denote by Xi the 
number of black balls preceding the first white ball, by X 2 the number of black balls 
between the first and second white ball, and so on; finally, by Xa-^-i the number of black 
balls following the last white ball. We have 


Xi + X2 + 


•4" ^0+1 = b 


E{xi) -f E{x 2 ) -f 


+ E{Xa-it\) = b. 


But as the probability of every sequence of balls (that is, of every system of numbers 
Xii X 2 , . . . Xa+i) is the same, namely, 


it is easy to see that 


albl 

(a-i-b)i 


E(xi) = E(x 2 ) = 


E(Xa+l) = M, 


That is, 


(a +\)M ^b 


Equating this to the preceding expression for M, an interesting identity can be 
obtained, whose direct proof is left to the student. 

5. In Prob. 6, page 168, to determine the probability v’(n), we had an equation 


<p{n - \) <p(n - 2) 
An) + -jy— + + 


n! 


^( 0 ) = 1 . 


Find the general expression for (p(n) using the method of generating functions. Ans. 
Let 

F[x) - ^(0) -f <p{l)x + ^(2)x* + • • • 
be the generating function of Multiplying this series by 


^ , X x^ x^ 
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we find 


or 


whence 


e*F{x) = 1 + x -h H- 


1 


I — X 


Fix) 




1 - x’ 


<p(n) = 1 


i 1 

1! 2] 




(- 1 )” 

n\ 


6. The total number of balls in an urn is known, but the number of white balls 
depends on chance and only its matliematical oxpcctalion is known. Find the prob¬ 
ability of drawing a white ball. Ans. Let N be the total ninribor of balls and M the 
expectation of the number of white balls. The r(‘(}uirod probability is M/N. 

-"^7. Two urns contain, respectiv(‘ly, a white and b black and a white and black 
balls. A certain number c (naturally not exceeding a -f ?>) of balls is transferred 
from the first urn into the second. What is tlu'. probability of drawling a white ball 
from the second urn after the transfers? Ans. The recjaired probability is 


<X “ 1 “ 


a -f 6 


a + + c 


8 . An um contains a white and b black balls. After a ball is drawn, it is to be 
returned to the urn if it is whit(‘; but if it is black, it is to be replaced by a white ball 
from another urn. Whal is th(5 probability of draw'ing a white ball after the foregoing 
operation has been repeated x times? Ans. Denotes by Mx the expectation cf the 
number of white balls after x operations. From the equation 

the following expression for M^ can be derived: 

It follows that the required probability is 

9. Urns 1 and 2 contain, respectively, a white and h black and c white and d black 
balls. One ball is taken from the first urn and transferred into the second, while 
simultaneously one ball taken from the second urn is transferred into the first. What 
is the probability of drawing a white ball from the first urn after such an exchange 
has been repeated x times? Ans. Let Mx and Px represent the mathematical exp(;cta- 
tions of the number of white balls in the first and second urn after x exchanges. Then 


1 


^ MxA- 


Ps 

c + d 


Mx . 
a A- b* 


Mr A- Pt ^ a A- c 



180 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. IX 


whence 

_ (g + c)(a -h h) _ ad — he / _ J __ V 

g+fc-f-c-hei g-j-fe+cH- rfy a h c -\- dj 

^ 18* An urn contains pN white and qN black balls, the total number of balls being 
N. Balls are drawn one by one (without being returned to f-he urn) until a certain 
number n of balls is reached. What is the dispersion of the number tn of white balls 
drawn? Ans. Let Xi = 1 if the ith ball drawn is white and Xi = 0 if it is black. 
We have 

E{xx) = Py E{m) — npy E{x\) = p 

and 

E(x, - p)(x, - p) = E(,x,Xi) - 

N — 1 

The required dispersion is 


N - n 

D — E{m — ?ip)^ ~ ^^pq:^; - 

N — I 

VlV In a lottery containing n numbers (1, 2, 3, . . . n) m numbers are drawn at a 
time. Let Xi represent the frequency of a specified number i in N drawings. Prove 
that 

E{xi) = Npy E{xt — Np)‘^ — Npq 
E(xi — Np){xj — Np) = Np{p' — p); (i 9^ j) 


where 


P = 


q = I - Py 


p = 


n — 1 


12. Let 


Zi = (xi — NpP — Npq. 
Show that the dispersion of the sum 

-f- ^2 + • • ' Zn 


^ 2N{N - 1 )^ 

D = — -,— {npqY. 

w — 1 


Indication of the Proof. Let N variables |i, ^ 2 , • . • be defined as follows: 

= — p if in the A:th drawing the number i fails to appear 
= g if in the fcth drawing the number i appears. 


In a similar way, we can define N variables 771 , 772 , * . . vrf associated with the 
number j ^ i. Since 


we have 
The variables 


Xt — iVp = ^1 4- ^2 -I- • • • 4- 

X; — Np = 771 4- 772 4 - • • • + 77V 


gu(»,—Vp) . gt>(*,-Vp) — fU^y+VVl . . . . g«{jy4-V77^^ 
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being independent, we have 

^(gu(xi-vp)+v(z,-vp)) — • • . E(e^^N-^^^N). 


But 


^(gw{j+*»7i) = J5;(€«t2+VT72) _ . . . _ ^(gw^iV-’-v^iv) = 

= 4- p(l — p')e9'^~J>^ 4- p(l — p')g(/t’-p« 4 - — p 4 . pp')e“>’“”'p'’ = 

= F{u, v). 

Hence 

E{e'*ix-Np)+v(x-Np)) == 

It suffices to expand both members into power series in u and v and compare terms 
involving to find 


E{z,z,); 


i ^ j. 


The regt does not present serious difficulties except for somewhat complicated calcula¬ 
tions. 

13. A box contains 2" tickets among which C\ tickets bear the number i (i = 
0 , 1 , 2 , ... w). A group of m tickets is drawn; denoting by s the sum of their 
numbers, it is required to find the expectation E and the dispersion D of s. 

1 1 m{m - l)n 

Arts, E - -mn; D ~ -mn - 

2 ' 4 4(2” - 1) 

14. A box contains k varieties of objects, the number of objects of each variety 
bfiing the same. These objects are drawn one at a time and put back before the 
next drawing. Denoting by n the smallest number of drawings which produce 
objects of all varieties, find E{n) and E{ri^). Arts. 


E{n) = k\ 


‘V ^ 


E{n^) — E(n) 


■f 




Use the result of Prob. 12, p. 41. 
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CHAPTER X 


THE LAW OF LARGE NUMBERS 


1. The developments of the preceding chapter, combined with a 
simple lemma due to Tshebysheff, lead in a natural and easy way to a 
far reaching generalization of Bernoulli\s theorem, known under the 
name of the ‘^law of large numbers.” 

Tshebysheff’s Lemma. Let u he a variable which does not assume 
negative values^ and a its mathematical expectation. The probability of the 
inequality 

u ^ 


is always greater than 



whatever t may be. 

Proof. Let 


Uly U2y • • • tin 


be all the possible values of the variable u and 


Ph P2, . . . Pn 

their respective probabilities. By the definition of mathematical expec¬ 
tation, we have 

(1) PiUi + P 2 U 2 + * * * + PnUn = a. 

We may suppose the notations so chosen that 

UU U2y . . , Ucc 

are all the values of u which are ^at^y the remaining values 

Uoi-\.\y lla~\^2y • • • 'Un 

being If all the terms in (1) with subscripts 1, 2, ... a are 

dropped, the left-hand members can only be diminished, since these 
terms are positive or at least nonnegative by hypothesis. We have, 
therefore, 

Pa+lWa+l + * * * + pnUn ^ CL. 


But as 


Ui > aP 
182 
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for f == a + 1, a + 2, . . . n a still stronger inequality, 


or 


at^(pa+i + • • * + Pn) < a 


Pa + l + * ‘ * + Pn < ^ 

will hold. 

Here the left-hand member r('i)resents the probability Q of the 
inequality 

u > at^ 


because this inequality can materialize only in the following mutually 
exclusive forms: either u = u a+i, or u = i^a+ 2 , or u = Un whose 
probabilities arc, respectively, pcr+i, Pa+ 2 , . . . Pn^ Thus 



But if P is the probability of the opposite event 


we must have 
whence 


u g aPf 


P + Q-1. 


P > 1 - 


which proves the lemma. 

2. Let Xi, X 2 , . . . Xn be a set of stochastic variables and Ui, a 2 , . 
their respective expectations. The dispersion of the sum 


Xl X2 • * * + Tn 


On 


which we shall denote by Bn is, by definition, the mathematical expecta¬ 
tion of the variable 


= (Xi + 0^2 + ' • • + — Ui — a 2 - • • • - Un)^. 


Tshebysheff’s lemma, applied to this variable u, shows that the proba¬ 
bility of the inequality 

{xi + X2+ ‘ + Xn — ai — 02 — • • • — anY g 


1 


1 

p 


is greater than 
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But the preceding inequality is equivalent to two inequalities 


-ty/Bn + X2+ • • 
or, dividing through by 7i, 


Xn (l\ CI2 


- On ^ <\/B^ 


^ X-[ ■"[“ X 2 " 4 “ 


Xn + ^2 + * ' • + 


Hence, the probability of thes(i inecpialities for an arbitrary positive t 
is greater than 

Let e be an arbitrary positive number. Definirig t by the equation 

\ 71^ 

whence 


we arrive at the following conclusion: The probability P of the inequalities 

^ Xi X 2 + • ’ ' + Xn U] + a 2 + • • • + an ^ 


equivalent to a single inequality 


\xi + 0:^2 + 


+ Xn ai + a2 + 


is greater than 


Thus far nothing has been supposed about the behavior of Bn for 
indefinitely increasing n. We shall now suppose that the quotient 
BJn^ tends to 0 as n increases indefinitely. Then, having chosen two 
arbitrarily small positive numbers e and ??, a number no can be found so 
that the inequality 


will hold for n > no. Consequently, we shall have 

P > \ - 71 

for all n > no. This conclusion leads to the following important theorem 
due, in the main, to Tshebysheff: 
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The Law of Large Numbers. With the probability approaching 1 or 
certainty as near as we pleasey we may expect that the arithmetic inean of 
values actually assumed by n stochastic variables will differ from the arithmetic 
mean of their expectations by less than airy given number, however small, 
provided the number of variables can be taken sufficiently large and provided 
the condition 

Bn 

—;r —> U as ri —> cx: 

rU 

is fulfdled. 

If, instead of variables Xi, we consider new variables Zi — Xi — at 
with their means = 0 , the same th('orem can Ix' stated as follows: 

For a fixed € > 0 , however small, the ])r()bal)ility of the inequality 

Z\ Z‘> ^ + Zn 

n 1 

tends to 1 as a limit when increases indefinitely, provided 



This tlieorem is v('ry ft'eneral. It holds for independemt or dependent 
variables indifferently if the sufficient condition for its validity, namely, 
that 

Bn . 

— 0 as 71 —> 00 

n^ 

is fulfilled. 

3. This condition, which is recognized as sufficient, is at the same 
time necessary, if the variables Zi, z^, . . . Zn are uniformly bounded; 
that is, if a constant number (one independinit of n), C, can be found 
so that all particular v^alues of z,{i = 1, 2 , . . . n) are numerically less 
than C. Let P, as before, dtmote tli(' probability of the inequality 

\zi + Zn + • * • + Zn\ ^ ne. 

Then the probability of the oppositeViequality 

\zi Zn Zn\ > ne 

will be 1 — P. 

Now, by definition, 

Bn = E{Zi + Zn + • ' ' + Zn)^ 

whence one can easily derive the inequality 



Bn < - P) + nVP 
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from which it follows that 

^ < CHI -P) + t^P < «“ + CHI - P). 

If the law of large numbers holds, 1 — P converges to 0 when n 
increases indefinitely, so that the right-hand member for sufficiently 
large n becomes less than any given number, and that implies 


which proves the statement. 

4. There is an important case in which the law of large numbers 
certainly holds; namely, when variables Xi, X 2 , . . . x„ are independent 
and the expectations of their squares are bounded. Then a constant 
number C exists such that 

bi = P(x|) < C for f - 1 , 2, 3, ... . 

On the other hand, for independent variables 

n n 

Bn = Xibi - a\) ^ < nC 

t « 1 t * 1 

and 

—>0 as n-->oo. 

rr n 

The expectations of squares are bounded, for instance, when all the 
variables are uniformly bounded, which is true, for instance, for iden¬ 
tical^^ or ^^equaF’ variables. Variables are said to be identical if they 
possess the same set of values with the same corresponding probabilities. 

6. E. Czuber made a complete investigation of the results of 2,854 
drawings in a lottery operated in Prague between 1754 and 1886. It 
consisted of 90 numbers, of which 5 were taken in each drawing. From 
Czuber^s book Wahrscheinlichkeitsrechnung,^' vol, 1, p. 141 (2d ed., 
1908), we reprint the table shown on page 187. 

With the 2,854 drawings, we associate 2,854 variables, Xi, 0 : 2 , .. . ^2864 
representing the sum of five numbers appearing in each of the 2,854 
drawings. These variables are identical and independent with the 
common mathematical expectation 227.5. Hence, by the law of large 
numbers, we can expect that the arithmetic mean of actually observed 
values of these variables will not notably differ from 227.5. To form 
the sum 

2864 

S = 

t » 1 
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Numbers 

Their frequency 
m 

Difference 
m — 158 

6 

138 

-20 

39, 65 

139 

-19 

16, 41, 76, 87 

142 

-16 

2, 14, 56, 79, 86 

143 

-15 

18, 44, 47 

144 

-14 

72, 80 

145 

-13 

12 

146 

-12 

21, 53 

147 

-11 

70 

149 

- 9 

24, 32, 55, 69 

150 

- 8 

27, 64, 75 

151 

- 7 

81 

152 

- 6 

23, 29, 85 

153 

- 5 

19, 35, 42, 74 

154 

- 4 

7, 20, 59 

155 

- 3 

13, 34, 40, 67, 88 

156 

- 2 

11, 52, 68 

157 

- 1 

17, 82 

158 

0 

15, 90 

159 

1 

58 

160 

2 

8, 25, 36 

161 

3 

22 

162 

4 

33, 57 

163 

5 

51 

164 

6 

3, 43, 45, 48 

165 

7 

10, 26, 66 

166 

8 

1, 5, 60, 84 

167 

9 

50, 62 

168 

10 

9, 61, 63 

170 

12 

54, 73 

171 

13 

49, 71, 78 

172 

14 

28 

173 

15 

37 

176 

18 

30, 46 

177 j 

19 

89 

178 

20 

31 

179 

21 

38 

184 

26 

4 

185 

27 

77 

186 

28 

83 

189 

31 


we must multiply the frequencies given in the preceding table by the 
sum of corresponding numbers. To simplify the task we notice that all 
numbers from 1 to 90, actually appeared. Hence, we multiply the 
sum of these numbers, 4,095, by 158, which gives: 

4095 • 158 = 647,010, 
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and then add to this number the sum of the differences m — IbS multi¬ 
plied by the sum of the numbers in the same line. The results are: 


Hence 

and 


Sum of positive products 
22,336 


Sum of negative products 
-19,587. 


S = 647,010 + 22,336 - 19,587 = 649,759 


S 

2854 


227.67, 


which differs very little from the expected value 227.5. An even larger 
difference V'Ould be in perfe(*t agrecanent with the law of large numbers 
since 2,854, the number of variables, is not very grtait. 

6 . The two experiments reported in this section were made by the 
author in spare moments. In the first (‘xpc'riment 64 tick('ts bearing 
numbers 0, 1, 2, 3, 4, 5, 6 and occurring in the following ])rop()rtions: 


Number. 

0 

1 

2 

3 

4 

5 

6 

Frequency. 

1 

1 

6 

15 

20 

15 

6 

1 


were vigorously agitated in a tin can and then 10 tickets were drawn at a 
time and their numbers added. Altogether 2,500 such drawings were 
made and their re'.sults carefully recorded. From these records we 
derive Table's I and II. 


Table I 


Num})er 

FrcqiK'iicy ob.served 

Expected frequency 

Discrepancy 

0 

404 

390 625 

+ 13.375 

1 

2,321 

2,343.75 

-22.75 

2 

5,850 

5,859.375 

- 9.375 

3 

7,863 

7,812.5 

+50.5 

4 

5,821 

5,859.375 

-38 375 

5 

2,344 

2,343.75 

+ 0.25 

6 

397 

390.625 

+ 6.375 


The next table gives tlie absolute values of differences s — 30 where s 
is the sum of the numbers on 10 tickets drawn at one time, and their 
respective frequencies. 

From Table I it is easy to find that the arithmetic mean of all 2,500 
sums observed is: 


74996 


= 29.9984 


2500 
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Table IT 


|s - 30l 

Frequenc}^ observed 

|.v - 30| 

Freciueiiey observed 

0 

246 

7 

71 

1 

549 

8 

44 

2 

479 

9 

25 

3 

379 

10 

8 

4 

324 

11 

4 

5 

241 

12 

1 

6 

129 




whereas the (‘xj)e(*tatioii of each of th(^ 2,500 idcaitieal variables under 
consideration l)y ProV). 13, page 181, is 30. By the same problem the 
.dispersion of s, that is, E{s — 30)^ is 12.857. On the other hand, from 
Table II we find that 


and 


Z{s - 30)2 ^ 31477 


- 30)2 
2500 


12.5908 


fairly close to 12.857. 

In the second ('xpcTiinent we tried to produce cards of every suit in n 
drawings (n being the smallest number required) of one (*ard at a time, 
each card taken b(‘ing returned before the lU'xt drawing. By Prob. 14, 
page 181, we find that the expe(*tation and the dispersion of this number 
n are, respectively, 8‘;i and 14.44. Altogether 3,000 values of n were 
recorded, of which 33 was the largest. Values of the difference — 8 are 
given in Table III. 


Table III 


w — 8 

Frequency 

n — 8 

Frequ(‘ncy 

n — 8 

Frequen(;y 

-4 

282 

G 

77 

1C) 

3 

-3 

420 

7 

50 

17 

5 

-2 

426 

8 

40 

18 

2 

-1 

407 

9 

31 

19 

1 

0 

348 

10 

17 

20 

3 

1 

247 

11 

15 

21 

1 

2 

228 

12 

13 

22 

1 

3 

156 

13 

6 

23 

1 

4 

116 

14 

9 

24 1 

0 

5 

88 

15 

6 

25 j 

1 


From this table we find 

S(n - 8) = 965, r(n - 8)^ = 43,395, 
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whence 

S(n - = S(n - 8)" - f 2:(n - 8) + = 43,085 

Sn = 24,965. 

By the law of large numbers we may expect that the quotients 

and 

3000 3000 


will not coiisidorably diffe^r from and 14.44, respectively. As a 
matter of fact, 


2n 

3000 


8.322, 


2(n^8i) 

3000 


- = 14.362. 


There is a very satisfactory agreement between the theory and this 
experiment in another respect. Of 24,965 cards drawn there were 

6,304 hearts 
6,236 diamonds 
6,131 clubs 
6,294 spades 

whereas the expected number for each suit is 6241.25. 

7 . So far, we have dealt with stochastic variables having only a finite 
number of values. However, the notion of mathematical expectation, 
and the propositions essentially based on this notion, can be extended to 
variables with infinitely many values. Here we shall consider the 
simplest ease of variables with a countable set of values, that can be 
arranged in a sequence 

• • • < q:_ 2 < «-i < ao < < ^2 < * • ' 

in the order of their magnitude. 

With this sequence is associated the sequence of probabilities 

. . . , V-h Po, Ph P2, . . . 

so that in general pi is the probability for x to assume the value ai. 
These probabilities are subject to the condition that the series 

Spt = • • • + p_2 + P-I + Po + pi + P2 + * * * 

must be convergent with the sum 1. 

The definition of mathematical expectation is essentially the same 
as that for variables with a finite number of values, but instead of a 
finite sum, we have an infinite series 

E{x) == 2p,ai 

provided this series is convergent (it is absolutely convergent, if con¬ 
vergent at all). If this series is divergent, it is meaningless to speak of 
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the mathematical expectation of x. Likewise, the mathematical expec¬ 
tation of any function <p(x) is defined as being the sum of the series 

E{(p{x)} = Xpnp{ai)y 
provided the latter is convergent. 

It can easily be seen that various theorems established in Chap. IX, 
as well as Tshebysheff ^s lemma, continue to hold when the various mathe¬ 
matical expectations involved exist. 

The law of large numbers follows, as a simple corollary, from Tsheby- 
sheff’s lemma if the following requirements are fulfilled: 

a. Mathematical expectations of all variables Xi, X 2 y Xz, . . . exist. 

b. The dispersion Bn of the sum Xi + x^ Xn exists. 

c. The quotient Bnln‘^ tends to 0 as n tends to infinity. 

The first requirement is absolutely indispensable. Without it the 
theorem itself cannot be stated. The second requirement (not to speak 
of the third) need not be fulfilled; and still the law of large numbers may 
hold, as Markoff pointed out. 

8. Let Xi, X 2 , Xz, . . . be independent variables. If for every i 
the mathematical expectation 

E{x\) 

exists, the quantity Bn exists also. But if at least one of these expecta¬ 
tions does not exist, the quantity Bn has no meaning. However, the 
following theorem, due to Markoff, holds: 

Theorem. The law of large numbers holds, provided that for some 
5 > 0 all the mathematical expectations 

E{\x,\^+0; i = 1,2,3,... 
exist and are bounded. 

Proof. For the sake of simplicity we may assume that 
E{xi) =0; i = 1, 2, 3, . . . . 

For, supposing 

E{Xi) = a<; T = 1, 2, 3, . . . 
instead of Xi, we may consider new variables 


Then 


Zi — Xi a%, 

Eizi) = 0 


and it remains to prove the existence and boundedness of 

f = l, 2, 3, .... 
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The proof follows immediately from the inequalities 
la;* — 0*1^+^ g 2^{|a;t|^-^^ + 

the first of which is well knowm; the second is a particular case of Lia- 
pounoff\s inequality, established in Chap. XIII, page 265. 

Thus, from the outset we are entitled to assume that 

E(xi) = 0. 

The proof of the theorem is based on a very ingenious and useful 
device due to Markoff. Let AT be a positive number which later we shall 
increase indefinitely. Together with Xi we shall consider two new varia¬ 
bles, Ui and Vi, defined as follows: a being a particular value of the 
corresponding values of Ui and Vi are 

Ui — a, v* = 0 

if |a| g N and 

= 0, Vi = a 

if |aj| > N. Thus, stochastic variables Ui and Vi are completely defined. 
Evidently 

Xi = Ui + Vi 

whence 

0 = E{ui) + E{Vi) 

and 

Pi = Eiui) = —E{vi). 

Now 

E{\vi\^+^) g E{\xi\^+^) < c 

by hypothesis. Since Vi is either 0 or its absolute value is >Ny we have 
N^E(\vi\) ^ Ei\vi\^^^) < c, 

whence 

( 2 ) 1^*1 = mvd\ < ~ 

Likewise, the probability g* for Vi 0 satisfies the inequality 

whence 

S'* ^ jy'i+fi' 
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Now, let us consider two inequalities 

< (T 

< (T 

where a is an arbitrary positive number and let Po and P be their respec¬ 
tive probabilities. The inequalities (4) and (5) coincide when 

Vi ^ V2 ^ * • * = Vn = 0. 

With this supplementary condition they have the same probability Q. 
But they can hold also when at least one of the numbers 

Vl, V2, . . . Vn 

is different from 0. Let the probabilities of (4) and (5) under such 
circumstances be Po and P. Then 

Po = Q + Po, P = Q + P. 

But evidently neither Po nor P can exceed the probability that in the 
series 

Vl, V2, . . . Vn 

at least one number is different from 0; this probability in turn does not 
exceed (see C/hap. II, page 30) 



+ 72 + ‘ ' * 

1 ^ nc 

A- Qn < 

Hence 


j. . nc 

and 



(6) 

IP - Pol 

^ nc 


On the other hand, since none of the values of Ui(i = 1, 2, . . . n) 
exceeds N, we have 

E{u\) ^ ^ 

Accordingly, the dispersion of the sum Ui + U 2 + • • * + Un will be 
less than 

cnN^~^, 

Hence, by what has been proved in Sec. 2, the probability of the ine¬ 
quality 


Ui U2 + • * 

• + + ^2 +* * ■ 

' + Pn 

n 

n 



(4) 

(5) 


+ ^^2 + ‘ • 

• + Wn 

n 


xi -{■ X2 * * 

“h 

1 ^ 


( 7 ) 
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is greater than 


1 - 


e^n 


But whenever (7) is satisfied, the inequality 

( 8 ) 


ni + n2 + * ' 

■ + Un 

n 



< I 1^1 + ^2 + • • • + ^ n | 

= 2 n 

is also satisfied. Hence, the probability of this inequality is a fortiori 
greater than 


1 - 


€^n 


Owing to inequalities (2), the following inequality follows from (8): 


ni + ^2 + • * 

* + Un 

n 



Hence 


and on account of (6) 


Po> 1 - 


. € C _ 
< 2 


4ciNr^“* 


P > I - 


Acm- 


nc 


€^n 

Now we can dispose of the arbitrary number N by taking 

ne 


N = 


Then 


P > 1 - 2cl 


(?)“-■ 


Now AT tends to infinity with n and as soon as n surpasses a certain 
limit no, the fraction 

c 

Jfs 

will become and remain less than c/2. The probability of the inequality 

< € 


Xi + X2-\- ■ ■ 

* + Xn 

n 



for n > no will be greater than P and consequently greater than 
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It tends, therefore, to 1 as n tends to infinity, and that proves Markoff^s 
theorem. 

Example. Lot the possible values of the variable Xpij) = 1, 2, 3, . . . ) be 
-f l)i, + 1)1, p-\p -f 1)3, . . . 

with the corresponding probabilities 


Since the series 


V _^ V 

(p 4- 1)V Ip + 1)3’ ■ ‘ ‘ * 


1 

P 


+ - +- 
V V 


+ . .. 


is divergent, the mathematical expectation 


E{xl) 


does not exist. Yet the law of large numbers holds. For 


£(ixpr«) 



n = l (V + 1) 




is a convergent series for any 0 < S < 1. Moreover, 


I _jo ~ 

(/> + l)2 2 2 


and consequently the conditions of Markoff’s theorem are satisfied for any 0 < 5 < 1. 
Hence, the law’ of large numb(Ts holds in this example. 


9. If variables xi, X 2 j xs, . . . are identical, the law of large numbers 
holds without any other restrictions, except that for these variables mathe¬ 
matical expectations exist. In fact, Khintchine proved the following 
theorem; 

Theorem. //, as we may naturally suppose, E{x^ = 0, the probability 
of the inequality 


\X\ -h X2 4" • • • -4~ Xn 
n 


< 


c 


tends to 1 as n increases indefinitely. 

Proof. The proof is quite similar to that of Markoff’s theorem and 
is based on the same ingenious artifice. Let 


• • * < a_2 < oL^i < ofo < ai < ^2 < * * * 
be different values of any one of the identical variables Xi, X2, xa, . . . and 
. . . , P—2, V—h PO) Ph P2i • • • 
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their probabilities. By hypothesis 


is a convergent series with the sum 0. The series 

is also convergent; let c > 0 be its sum. 

Keeping the same notations as before, we have 

l^il g Ei\v,\) = 5 pi\a,\ = ^(iV) 

\ai\ >N 

where is a decreasing function tending to 0 as AT —> oo. Also 

E(i^) S NE\xi\ = cN 
so that the dispersion of the sum 

Wl + ^.2 + • * * + Un 

is less than 

cNn. 

Consequently the probability of the inequality 


fO) Ui -\- U 2 -\r ■ 

^ n 


• + ^ e 

n 

= 2 

is greater than 

4cN 


On the other hand, the probability g, of the inequality ^ 
than 

HNY 

N 

because 

and 

N X P<< 

{at) >N 



9. = X 

|at| >N 



Hence, the difference between the probability of the inequality 

\Ui + + • • • + Un\ ^ 

- < 0 - 


ill + ^2 + • • 

• + Un 

n 
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and that of the inequality 


Xl + X2+ • • 

■ + Xn 

n 



is numerically less than 


n\p{N) 

N 


As in the preceding section we conclude that the probability of the 
inequality 

1^1 + ^^2 + ' ■ • + Ur \ 




is greater than 


1 _ 


^cN 


Finally, the probability of the inequality 

( 10 ) 

is greater than 


+ ^2 + ' * 

■ + X„ 

n 



^ I + ^'{N) 


1 - 


4c A nypiN) 

e^n N 

To dispose of N we ol>serve that the ratio 

N 


is a decreasing function of N and tends to 0 as A —> oo. 
for large there exists an integer N such that 


Then 


^ \/4c ^ \/ \I/{N - 1) 
N ^ €n = ■ " A - 1 


Hence, at least 


< ^vm: 


4cN ^ \/4c 
e^n ~ e 


- 1) 


whence it follows that the probability of inequality (10) is greater than 

1 _ - T)]. 

Now N increases indefinitely together with n; therefore, for all n 
above a certain limit no, 

HN) < ^ 
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SO that for n > no the probability of the inequality 

< € 


Xi + Xi+ ■ ■ 


n 



will be greater than 


Vicl 


VHN) + - 1) 


and with indefinitely increasing n will approach the limit 1. Thus 
Khintchine’s theorem is completely proved. 

Example. Let 

2l-2logl^ 22“21oft2^ 2^“2iog3^ 2”"‘21ogn^ 

be all possible values of identical variables X 2 , xa, . . . and 


12 1 1 

2 22 ’ ‘ ‘ ' 


their corresponding probabilities. Since the series 


1 1 1 
22iogi 2®^*** 2*^^^** 


= 1 + — + — 

2lc«4 3log4 




is convergent, mathematical expectations of the variables xi^ X 2 ^ Xs, . . . exist. 
Hence, the law of large numbers holds in this case. 

Markoff’s theorem cannot be applied here, because for any positve 5 the series 


is divergent. 



2»5 

^(1+5)1o«4 


1 


Problems for Solution 

1. Let X be a stochastic variable with the mean = 0 and the standard deviation <r. 
Denoting by P{t) the probability of the inequality 

X ^ / 

show that 

P(t) ^ -: for t <0 

(T* + 

1 - P(0 £ for < > 0. 

<r* + P 

Show also that the right-hand members cannot be replaced by smaller numbers. 
Indication of the Proof. Since 

= 0, Sp»x? == <r*, 

we have also 

Sp»(x< - 0 « XpiiXi - 0* * <r* 4- 
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whence, supposing that x* > < for i == 1, 2, . . . s and first taking t negative, 


2 s « 


s ] '£^p.{xi - 0 [ g - ty s (1 - p«))(.r» + p) 


1 - PH) g 


For positive t the proof is (juite similar. Considering a stochastic variable with 
two values: 

x\ ~ U f)i = -- 


^ , p ~- 

t o2 -f /2 

one can easily prove the last part of our statement. 

2. Tshehysheff*s Problem^ If x is a positive', stochastic variable with given 

E{x) - a2, Eix’^) = T'' 
then the probability P of the inequality 

X ^ V 

has the following precise upper bounds: 

P ^ I for t> < <r* 

(j® 

P ^ — for V < — 

V cr^ 

P - <r^ T* 

P —-— for V — 

7-1 f;2 _ 2ah) 

Indication of the Proof. Let 


Then ^ < r if v ^ P/<r^ and 




for X V. On the other hand, 


\v - I 


4 _ _ P - <r* 


whence 


T* -f* — 2crV 

^ Sur les valeurs limites des integrales. Jour. Liouville, Ser. 2, T. XIX, 1874. 
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The equality sign is reached for the stochastic variable with two values: 

{V ~ 


Xi = 7>1 = - 


+ t;2 — 2(tH) 
4 _ 


Xi = a, P2 = 


t 4 4- ,;2 _ 

If (T^ ^ V < T^/o-^ we have an obvious inequality 




To show that the right-hand member cannot be replaced by a smaller number, con¬ 
sider the following stochastic variable with three values; 


xi = 0, Pi 


(Z — (T^)v — -f- T* 


Iv 


Xi = V, Pi = 

XZ = Z, P3 = 


- - 


t;(Z — v) 

- ah) 

W-ii) 

where I > r is an arbitrary number. For this variable 

7 * — (tH 


P = Pi A- Vz 


Iv 


is arbitrarily near to (t^/v for sufficiently large 1. 

3 . If X is an arbitrary stochastic variable with given 

E{:x'^) = <t2, E{x^) = 

and P denotes the probability of the ineipiality 

lx I ^ k(T, 

then 



These inequalities cannot be improved. 

Hint; Follows from Tshebysheff’s problem. 

4 , Let Xi assume two values, i and —i with equal probabilities. Show that the 
law of large numbers cannot be applied to variables X\, Xi^ Xz^ . . . . 

6. Variables xi, X 2 , Xa, . . . each assume two values: 
log a or —log a; log (a + 1) or —log(a-l-l); log (a+2) or —log(a-|-2); • • • 
with equal probabilities. Show that the law of large numbers holds for these vari¬ 
ables. 
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Hint: E{xi) = 0; i = 1, 2, 3, . . . 

B. = E(xi + X2 + • • • + x„)» = 

n— 1 

== ^ (a + t) P (a -f- w — l){log (a + n — 1)P 

x=-0 

as can easily be established by using Euler’s summation formula (Appendix 1, page 
347). Hence 


rr 

6. If Xi can have only two values with equal probabilities, and — show that the 

law of large numbers can be applied to x-i, xa, , . . ii a < y^. 

Hint: 

7>2« + i t> 1 

B. - P“ -f-22« +... if «<-. 

2« -y \ 2 

It can be shown that the law of large numbers docs not hold if a ^ I < 2 . 

7. In an indefinite Hernoiillian series of 1 rials with the constant probability p, 
let Wi denote the number of successes in the first i trials. Show that the law of large 
numbers holds for variables 


nii — ip 

{ipq)^ 


i = 1, 2, 3, . . . 


if a > y'2. 

Hint: Evidently E{xt) — 0, E{x]) — {ipqy~^^ imd. 

n 

B„ = + 2 '^E{XiX,). 

i — 1 j >i 

Now 

E{x^xy = {ij)-°^{pq)-^°^E{7ni - ipY + ij.j)-^{pqy^^E \{:m^ - ip){mj- rm- {j-i)p)] = 
- {pqy-^^y-^j-^ 

since m* — ip and nt j — vh — (j — i)p are independent variables. Thus 

n 

B„ = +2X*‘"“.?““] 

1=1 j>i 

and it is easy to show that 

-> 0 as n —► 00 

provided a > 3^. But the law erf large numbers no longer holds if a ^ The 
proof of this is more difficult. 

8. The following extension of Tshebysheff’s lemma was indicated by KolmogorojBf. 
Let Xi, X 2 . . . x„ be independent variables; E{Xx) = 0, E{xl) — hi, 

Bn = hi -y hi + • • • + 
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and 

s* = + X2 4- • ‘ + a:*; A; = 1, 2, ... n. 

Denoting by P the probability of the inequality 

(A) max. (sj, ^ 2 , . . . si) > BnP, 

we shall have P < l/P. 

Indication of the Proof. The inequality (.4) can materialize if and only if one of 
the following mutually exclusive events occurs: 

event ci: sj > BnP; 

event ^ 2 : ^'5 ^ BnP; si > B^P; 

event € 3 : s? ^ BnP; si ^ BJ^; si > BnP; 

event Cn: sf ^ BnP; si ^ BnP; ■ * • ■s;_i ^ BnP; si > BnP. 

If (ci) represents the probability of e,(i = 1, 2, . . . 71 ) then 

P = (e,) + (C 2 ) 4- • • • + (fn). 

Now consider the conditional mathematical expectation E{sl\ek) of si given that 
€k has occurred. Since the indication of Ck does not affect variables Xk+\, Xk+ 2 . . . . Xn, 
these variables and Sk are independent. Hence 

E{sl\eK) - E{sl\ek) 4- hk^i 4- • • • 4- 6n > Bnt\ 

On the other hand 

n 

B„ = E(sl) = ^ (ek)E(Hl\et) > + (e.) + • • ■ + (e„)) 

A: = 1 

whence P < 1 /P. 

9 . The Strong Law of Large Numbers (Kolmogoroff). Using the same notations 
as in the preceding problem, show that the probability of the simultaneous inequalities 


- < € 


1 i^n+2 

n 

n + 1 = ’ 

\n 4- 2 


will be greater than I — v, provided n exceeds a certain limit depending on the choice 
of € and T}, and granted the convergence of the series 



Indication of the Proof. Consider variables 

n = max. - ^for Pi = ^ m < 2^n; z == 1, 2, 3, • . . 

and denote by gi the probabiUty of the inequality t» > 3^2 €. By Kolmogoroff'a 
lemma 

l = 2»n — 1 

4 X ^ 

2 »‘“* n *€2 


< 
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and 


g. + + 9. + • • • < X 

» = 1 l — 2*~ *n 


00 Z « 2*n — 1 

< 16 .-X X 


i^l 




or 

ec 

Qi Q2 A' Q:i ' < 1 () 6~2 — . 

k — n 

Hence, the probability of fulfillment of all the inc(iiialities t» ^ ^ 2 ^; z = 1, 2, 3, . . . 
is greater than 


1 

^ /C2 

k — n 

The inequalities \sk/k\ ^ e; k = n, n + 1, n + 2, . . 
taneously 

Ti ^ i = 1, 2, 3, . . 


. are satisfied when simul- 


and 


«n~l 

n 


1 

< -€. 
“ 2 


4Bn 

The probability of the last inequality being greater than 1-the probability 

of siinultanoous inequalities 

^ €; A: = n, w -f- 1, w + 2, . . . 
a fortiori will be greater than 


k 



ABn 


This inequality suffices to complete the proof if we notice that Bnln^ tends to 0 when 
the series 


00 

X 


/c2 


is convergent. 

10. Let Xu X 2 , . . . Xn be identical stochastic variables and E{xi) - 0. 
by Pn{t) and /\(e), respectively, the probabilities of the inequalities 


Denoting 


Xi A- X2 At • 
n 


show that 


+ Xn 


> 6 


and 


xi A' Xz A- ' ' ' At Xn 
n 


< 


lim = 0 or 4“ 00 

n« ^Pn{t) 


€ 


according as E{x\) > or <0. 
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For the proof see Khintchine’s paper in Mathematische Annalen (vol. 101, pp. 381- 
385). 

11 . The Law of the Repeated Logarithm {Khintchiney Kolmogoroff). Let Xi, Xj, 
. . . Xn be bounded independent variables, E(xi) = 0, f = 1, 2, , . . n and Bn —> w 
as n —♦ 00 . For an arbitrarily small 5 > 0 and 6 > 0 and for an arbitrarily large N 
one can choose no > N so that: 

a. The probability of the fulfillment of the inequality 

|s„| > (1 -h b)\/2Bn log log Bn 

for at least one n ^ no is less than e. 

b. The probability of the fulfillment of the inecpiality 

|s„| > (1 — 8)\/2Bn log log Bn 

for at least one n ^ no is greater than 1 — e. 

For the proof see Kolmogoroff’s paper in Moihematische Annalen (vol. 101, pp. 126- 
135). 

If Xi, Xa, . . . Xn are variables independent, in pairs and Bn the dispersion of their 
sum s = Xi + X 2 A- * * • A- Xn, then the probability P that 

l«l ^ fVK 

satisfies the inequality 

P > \ — - (Tshebysheff’s inequality) 

provided £^(xi) = 0, f = 1, 2, . . . n, which can be assumf^d without loss of generality. 
In case variables arc totally independent and are subject to certain limitations of com¬ 
paratively mild character, S. Bernstein has shown that Tshebysheff’s inequality can be 
considerably improved. 

12. Let xi, X 2 , . . . x« be totally independent variables. We suppose E{xi) = 0, 
E{x\) = bt and 


for i = 1, 2, . . . n and h > 2, c being a certain constant. Show that 

where <r is an arbitrary positive number < 1 and e is a positive number so small that 

€C ^ or. 

Indication of the Proof, We have 
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13 . If Q denotes the probability of the inequality 

xi + a:2 + • • • A- Xn > ■ -f- - 

2(1 - a) e 

show that Q < e~*^. 

Indication of the Proof. If (J is the probability of the inequality 

then, by Tshebysheff’s lemma, (J < < and Q <<Jhy Prob. 12. 

14 . S. Bernstein^s Inequality. Denoting by P the probability of the inequality 

|a;i + X2 -f- * • • + x„| ^ CO, 

CO being a given positive nuni])er, show that 

ai2 

/' > 1 - 


Indication of the Proof. To make = F minimum take e = : 

2(1 -<r) 6 

I 2S 

then F = t.^ ~ ^ determined by equating F to co. The resulting value of e, 




is admissible only if ec ^ <r or ^(1 — a) ^ <r. The best choice for o- is o- = ——— 

Bn Bn -+■ ecu 


and correspondingly t = 


V^B,.'+ 2cw 
.ri + X2 + 


By Prob. 13 the probability of the inequality 

• • • + Xn >■ CU 


is less than e 2-£?n+2ca» same is true of the probability of the inequality 

Xl A- ^2 A- ’ A- Xn < —CO or —Xi — X 2 — • ■ • —Xn > CO. 

16. If variables Xi, X 2 , . . . Xn are uniformly bounded and M is an upper bound 
of their numerical values, then we may take c = M/3. 

Indication of the Proof. Note that 


E{\xi\^) ^ 






16. Consider a Poisson’s series of trials with probabilities pi, pa, . . . Pn for an 

event E to occur. Let m be the frequency oi E ’mn trials, p = 2—- 

n 

X = -(viQi + P252 + • • • + p»gn). Show that the probability P of the inequality 
n 


-P 


^ 6 has the following lower limit: 

P > 1 - 2e 
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In the Bernoullian case pi — pi — • • • —pn,'k= pq and consequently 


net 

P > \ - 2e 


17. An indefinite series of totally independent variables Xi, Xa, Xs, . . . lias the 
property that the ruathernatical expectations of any odd power of these variables is 
rigorously = 0 while 




(hym\ 

V2/ fc! ’ 


b. = E{xl) 


for t — 1, 2, 3, . . . . Prove that the probability of either one of the inequalities 


Xi -Y X'i A- ■ • • -f > t\/2Bn or Xi + x-z + • ■ ‘ < ~ t\/2Bn 

where Bn — bi + bz A • • • + &« is less than (S. Bernstein). Prove first that 


€jt)J 

18. Positive and negative proper decimal fractions limited to, say, five decimals, 
are obtained in the following manner: From an urn containing tickets with numbers 
0, 1, 2, ... 9 in equal proportion, five tickets are drawn in succession (the ticket 
drawn in a previous trial being returned before ilie next.) iind their rc?spective numbers 
are written in succession as five decimals of a proper fraction. This fraction, if not 
equal to 0, is preceded bj^ t he sign + or —, according as a coin tossed at the same time 
shows heads or tails. Thus, repeating this process several times, we may obtain as 
many positive or negative proper fractions with five decimals as we desire. What 
can be said about the probability that the sum of n such fractions will be contained 
between prescribed limits - w and w? Ans. These n fractions may be considered as 
so many identical stochastic variables for each of which 


Besides, 


(1 - 10 "^) (2 - 10 -- 6 ) 1 
= 0 , /3 = E{x^) - ^- - -- < 

6 3 


106-1 


X 


= 4~Tr < • ^ 


2 * + 1 


since in general 


12* 4. 22* + . . . + (5 - 1)2* < 


2k + 1 


Again, the inequality 




can easily be verified and we can apply the result of Prob. 17. For the required 
probability P the following lower limit can be obtained: 




P > 1 - 2e >1 - 2e ; 
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or, if w = ne 

— ine* 

P > 1 ~ 2e 

For example, if € = Ko and w ^ 814, 

P > 0.99999, 

that is, almost certainly the sum of 814 fractions formed in the above described man¬ 
ner will be contained between —82 and 82. 
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CHAPTER XI 


APPLICATIONS OF THE LAW OF LARGE NUMBERS 

1. A theorem of such wide generality as the law of large numbers is a 
source of a great many important particular thc'orems. We shall begin 
with a generalization of Bernoulli’s theorem due to Poisson. 

Let us consider a series of independent trials with the respective 
probabilities pi, p^, ps, • . . , varying from one trial to another. Con¬ 
sidering n trials, we shall denote by m the number of successes. The 
arithmetic mean of probabilities in n trials 

Pi + P2 + ■ • + Pn 

V — - - - - - 

n 

will be called the ‘^mean probability in n trials.” With such conditions 
and notations adopted, we can state Poisson’s theorem as follows: 

Poisson’s Theorem. The probability of the inequality 



for fixed e > 0, no matter how small, can be made as near to 1 {certainty) as 
we please, provided the number of trials n is sufficiently large. 

Proof. To show that this theorem is but a particular case of the law 
of large numbers, we use an artifice often applied in similar circunn 
stances, namely, we associate with trials 1, 2, 3, ... n variables Xi, 
X 2 , Xs, . . . Xn defined as follows: 

Xi = 1 in case of success in the fth trial, 

Xi = 0 in case of failure in the fth trial. 

Since the trials are independent, these variables are also independent. 
Moreover 

E{xf) = £(xf) = Pi 

and the dispersion of Xi is 

Vi - Pi = PiQi- 
The dispersion Bn of the sum 

Xi + X2 + ' • • + Xn 

is the sum of the dispersions of its terms, that is, 

n 

Bn = Pl^l + P2g2 + • • • + Pn^n ^ 

At the same time, the former sum represents the number of successes m, 

208 
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Now, applying the results established in Chap. X, Sec, 2, we arrive 
at this conclusion: Denoting by P the probability of the inequality 


m 

n 




we shall have 


P > 1 


A>i 

nV = 


1 


It now suffices to take 


n > 




to have 


P > 1 - 77 


where rj is an arbitrary positive number no matter how small. That 
completes the ])roof of Poisson\s theorem. 

p]vidently Bernoulli's theorem is contained in Poisson's theorem as a 
parti(uilar case when 


Pi = P2 = • • * = Pn = p. 

Poisson himself attached great importance to his theorem and adopted 
for it the name of the ‘Maw of large numbers," which is still used by many 
authors. However, it aj)pears more pro})er to reserve' this name to the 
theorem established in Cha]). X, See*. 2, which is due to Tshebysheff. 

2. Let us consider n series each consisting of s independent trials with 
the constant probability p. Also, let 

iriij m,2, • . * 

represent the number of successes in each of these s series. Stochastic 
variables 


Xi = (mi ~ sp)2, X2 == (m2 — sp)^ * • • Xn = {rrin — sp)^ 

are independent and identical. Their common mathematical expecta¬ 
tion is spq. The law of large numbers can be applied to these variables 
and leads imrneiliately to the conclusion: The probability of the inequality 


^(rrii — sp)^ 


< € 


can be brought as near as we please to 1 (or certainty) if the number of 
series n is sufficiently large. 
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Substituting tspq for € and dividing through by spq, we may state the 
same proposition as follows: The probability of the inequalities 


1 


€ < 


Y^irrii - spy 

i = l 


Npq 


<1 + 6 , 


where N — ns is the total number of trials in all n series, can be brought 
as near to 1 as we please^ if the number of series is sufFieienily large. 

The law of large numbers can be legitimately applied to the variables 


.Tt = \mi — sp\; z = 1, 2, 3, . . . 


with the common mathematical expectation 


Ms = 2spqC*"^z\p^~^q''~^ 


where // = [sp + 1], and leads to the following proposition; The proba¬ 
bility of the inequalities 


1 ~ € < 


- sp| 


nMs 


< 1+6 


can be brought as near to 1 as we please if the number of series is suf¬ 
ficiently large. 

For the sake of simplicity, let us use the notations 


- spy 

^2 ^ - 

n 

n 

Xi”*' “ ®pi 

B = - 

n 


The probabilities P and P' of the inequalities 


(1) — c) < A < \/spq{l + a) 

(2) iW.(l -<t) < B < M,{1 + a) 
which are equivalent to 


(1 - <^y < 


n 

- spy 

T = 1 


nspq 


< (1 + <Ty 


- sp| 

1 — ff < - < 1 + <r 

nM, 
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can both be made greater than 1 — rj, where tj is an arbitrarily small 
positive number. The probability of simultaneous materialization of 
(1) and (2) is not less than 

P + P' - 1 > 1 - 277. 

But whenever (1) and (2) hold simultaneously, we have 

m V-W l_Z_f ^ - Vsp'? 1 + <r 

^ ’ M, 1 + C B ^ M, 1 - a 


Therefore the probability of these inequalities is again >1 — 2rf. Now 
let us take 


2 + r 


where r is another positive number arbitrarily chosen. Then 


1 -f 0- 

1 - (7 

Hence, the inequalities 

V^pg 


= 1 4- r; 


1 -f <7 


> 1 - r. 


M. r) < ^ < + r) 


follow from inequalities (8) and their probability is a fortiori > 1 — 277. 
It suffices to take 


■y/ spq 

to arrive at the following proposition: 
The probability of the inequality 

\A -y/spql 


B 


Ms 


< e 


for a fixed € and sufHcicuitly large number of series can be made as near to 
1 as we please. 

If spq is somewhat large, the quotient 

Vmj 

^M, 


differs but little from y/rfi (sec Chap. IX, Prob. 2, page 177). Hence, 
when the number of series is large and the series themselves sufficiently 
long, we may expect with great probability that the quotient 

A 

B 


will not differ much from w/2. 
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Divergence Coefficient 

3. The considerations of the preceding section can be generalized. 
Let us consider again n series containing trials each, and let 

mi, m2, . . . m.r, 

represent the numbers of successes in each of these series. Without 
specifying the nature of the trials (which can be independent or depend¬ 
ent) we shall denote by p the mean probability in all N = ns trials and 
by ^ = 1 — p its compleiiK'nt. Again considering the quotient 

n 

- spY^ 

^ = -^pr~’ 

we seek its mathematical exp(Hdation 

E{Q) = D. 

When all the N trials are of the Bernoullian type, D = 1. But it is also 
possible to imagine cases when D > 1 or Z> < 1. Lexis calls \/D tlu^ 
^^coefficient of dispersion.We shall call D itscdf the ^^theoretical 
divergence coefficient.” If mi, m 2 , . . . are actually observed fre¬ 
quencies in n series, the quotient 


^(m.- - sp)2 

D' = - -'_ 

Npq 

may be called ‘^empirical divergence coefficient.” Then, if the law of 
large numbers can be ai)plied to variables 


Xi 


(nii — spy 

spq 


i = 1, 2, 3, 


we can expect with probability, approaching certainty as near as we please, 
that the inequality 

|D' - D\<€ 

will be fulfilled for an adequately large number of series. 

Thus far we have not specified the nature of the trials. Now we shall 
suppose that all N = ns trials, distributed in n series, arc independent 
but with probabilities varying in general from trial to trial. Let 


Puy p2ij • • • Pat 


(t = 1, 2, . . . n) 
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be the probabilities in successive trials of the iih series. Their mean 


Vi 


_ Pit 4- P2* + • • • + Ps 


is the mean probability in the ith series. Finally 


V 


_ Pi + P2 + • • • + Pv 


is th(‘ mean probability in all N = ns trials. As to the expectation of 
(rui — sp)-j we find 

E(mi — spy = E(m^ — spi + s(pi — p))^ = E(mi — sp^)‘^ + s^(pi — py 
since 

E(mi — spi) = 0. 

On the oilier hand, 


and 


whence 


E{m.i - sp<)2 = = sp, - 

S 8 

%iPi - Pn)- ^ 




j = i 


E(mi - spi)^ = spi — sp? - ^ (p,- - Pay. 


j = i 


Now, letting i take values 1, 2, ... n and taking the sum of the 
results, we get 


'^E{mi - spi)2 = nsp - s^p? - ^ ^ (p< - Pn)^. 


1 = 1 


i = 1 1 = 1 i = 1 


But 


1 = 1 


*2) (p “ ^ — nsp2 + s^p? 

1 = 1 

whence finally 

n 

z)= 1 +(p - Piy - y 

npq ^ ^ Ap 


Apg 


(p.- - p/i)*. 


t-i 1=1 /=i 

Two particular cases deserve special attention. 
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Lexis’ Case. Probabilities remain the same within each series, 
but vary from series to series. In this case = p* and the expression of 
D becomes: 

n 

1 = 1 

The theoretical divergence coefficient in this case is always greater than 
1 and may be arbitrarily large. 

Poisson’s Case. The probabilities of the corresponding trials in all 
series are the same, so that 

Va == 

and 

TTi + 7r2 + • • ■ + TTs 

V ^ Vi = --7—.— 

In this case the divergence coeflSicient 


2) (p - 7r<)2 
£) = 1 _ - 

is always less than 1. 

Since the law of large numbers evidently is applicable to variables 

_ {'frti- spy 


we may expect that the empirical divergence coefficient D' will not 
differ much from D if the number of series is sufficiently large. 

For numerical illustration let us consider 100 series each containing 
100 trials, such that in 50 series the probability is % and in the remaining 
50 series it is %, Here we evidently have Lexis’ case. The mean 
probability in all trials is 

p = I 

and 

100 

(I — = 50 • ’ liir == 1- 

Finally, 

D =: 1 + = 4.96. 

Now, suppose that we combine in pairs series of 100 trials with 
probability % and series of 100 trials with probability to form 50 
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series each of 200 trials. Evidently we have here Poisson’s case. The 
mean probability in each series again is p = I 2 

200 

^ (2 ■“ '^ i )^ “ 100 * + 100 • = 2. 

i-l 

Finally, 

D = 1- ^, = 0.96. 

The consideration of the divergence coefficient may be useful in 
testing the assumed independence of trials and values of probalnlities 
attached to these trials. In the simplest case of Bernoullian trials with 
a constant and known probability, the theorc^tical divergence coefficient 
is 1. Now, if the numlxT of series is sufficiently large and the empirical 
divergen(5C coeffi(dent turns out to be considerably different from 1, 
we must admit with great probability that the trials we deal with are not 
of the supposed type. If, however, the empirical divergence coefficient 
turns out to be near 1, that does not conclusively prove the hypothesis 
concerning the independence of trials and the assumed value of the 
probability. It only makes this hypothesis plausible. 

There are cases of dependent trials (complex chains considered by 
Markoff) in which the theoretical divergence coefficient is exactly 1 and 
the probability of an event has the same constant value in each trial, 
insofar as the results of other trials remain unknown. Cases like that 
may easily be mistaken for Bernoullian trials without further detailed 
study of the entire course of trials. 

4. When there is good reason to believe that the trials are independent 
with a constant but unknown probability, we cannot in all rigor find the 
value of the empirical divergeiu’c coefficient 


'^{nii - spy 

D' = '—I - . 

Npq 

to compare it with the theoretical divergence coefficient D = 1, since p 
remains unknown. 

But, relying on Bernoulli’s theorem, we can take the quotient 


where 


M 

N 


M ^ mi + m 2 + • • • + mn 


as an approximate value of p. By taking p = M/N in the preceding 
expression for D' we get another number 



216 INTRODUCTION TO MATHEMATICAL PROBABILITY (Chap. XI 


D'' = ____ 

M{N - M) 

which in general is close to D'. However, considering mi, ms, . . . m,* 
not as observed but as eventual numbers of successes in n series, the 
mathematical expectation of Z)" is different from 1. To avoid this 
difficulty, it is better to considt^r a slightly different quotient 

n 

n{N - 1)2 

For this quotient there exists a theorem discovered and proved for the 
first time by the eminent Russian statistician Tschuprow. 

Theorem. The mathematical expectation of Q is rigorously equal to 
Proof. Here we shall develop the proof given by Markoff. The 
above given expression of Q presents itself in the form Po and therefore 
has no meaning in two cases: M = 0 or M = N. For thes(‘ exceptional 
cases we set Q = 1 by definition. If neither ilf = 0 nor M = we 
can present Q in the form 


(4) 


2 2 _ 

_ . n 

y M) ■ 


Considering mi, m 2 , . . . mn as stochastic variables assuming integral 
values from 0 to s, the probability of a definite system of values 


mi, m 2 , . . . m„ 
is 

^ s\ s\ 

jP ~-. —... -• • • , --- 'pM qN— M 

mi!(s — mi)! m 2 Ks — m 2 )! m„!(.s — mn)r ^ 

To get the expectation of Q we must multiply it by P and take the 
sum 

E(Q) = SPQ 

extended over all non-negative integers mj, m 2 , . . . mn, each of them 
not exceeding s. To perform this multiple summation we first collect 
all terms with a given sum 

mi + m 2 + * • • + m„ = Af. 

^ The theorem itself and its proof given by Markoff can be extended to the case of 
series of unequal length. 
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Let the result of this summation be Sm. Then it remains to take the 
sum 

N 

^ Sm 

M = 0 

to have the desired expression E(Q). To this end we first separate two 
terms corresponding to Af == 0 and M — N. In the former case 

mi = 7/12 = * • * = m„ = 0 

and the probability of such an event is while Q = 1 . In the latter 
case 

mi = m ,2 = • • * = rrin = s 

the probability of which is while again Q = 1. Thus 

N-l 

EiQ) = pA' + + 2) Sm. 

To find Sm we observe that the denominator of Q has a constant value 
when summation is performed over variable integers mi, m 2 , . . , mn 
connected by the relation 

mi + m 2 + * * • + mn = M. 

Hence, it suffices to find two sums 

ZP and 'EPm'l 

extended over integers mi, m 2 , . . . mn varying within limits 0 and s 
and having the sum M. To this end consider the function 

V = + qYijpteM + 7 )* • • • + 5 ')" 

involving n + I arbitrary variables i, ^i, { 2 , . . . fn- When developed, 
V consists of terms of the form 

• • • +mn^mi{i+»ns{24' • • • +Wnfn 

Evidently we obtain the sum ZPby setting fi = ^2 = * • * = = 0 

and taking the coefficient of in the expansion 

= {pi + 

Thus 

M!(iV - 

To find take the second derivative 

dW 
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and after setting = {2 = • ' * = in = 0, expand 




and take the coefficient of Thus we find 


(6) ^ r,__1)_ (^:l 2)L^ IpM,. 

Referring to (4), (5), and (6), we easily get 


n{N - 1) 

in - l)M{N~- M) 


- ._ \nN -n + 

n{M - \)\{N - Myr^ ^ 

+ {N — n)(M - 1) - M(N - l)]p" 5 'v-"; 


or, after obvious simplifications, 


N' 

Hence 

A^~l 

2 ) Sm = {p + q)N ^ pN ^ qN =, I ^ pN ^ qN^ 
M«1 


and finally 

E(Q) = 1. 

Markoff, using the same method, succeeded in finding the explicit 
expression of the expectation 


E{Q - ly. 


Since there is no difficulty in finding this expression except for some¬ 
what tedious calculations, we give it here without entering into details 
of the proof: 


E{Q ~ 1)'^ 


2N{N - n) 

(n - 1){N - 2){N - 3) 


N-l 


2 


M - 1 N - M -1 
M ' N - M 


C^pMqN-M^ 


whence the following inequality immediately follows: 


E{Q - 


1)2 < 


2NiN - n) 

(n - 'l)(Ar - 2)(Ar - 3)' 


In case n S 5 a still simpler inequality holds: 
(7) E{Q - 1)2 < -1^. 
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Let R be the probability of the inequality 

Q > 1 + 6 , 

where e is a positive number. Applying the same reasoning to inequality 
(7) as was used in establishing T8hebysheff\s lemma, we find that 

R < , - 

{n — 1)€^ 

Likewise, denoting by the probability of the inequality 

Q ^ 1 - 6, 

we have 

P' - 2 

^ ^ (n - 1)62* 

Thus, in a large number of series it becomes very unlikely that the 
value of Q found in actual experiment would lie outside of the interval 
1—6, 1+6. For instance, the probability for Q ^ 2 in 100 series is 
surely less than 

99 


or nearly 0.02. However, this limit is mucdi too high. It would be 
greatly desirable to have a good approximate (expression for the proba¬ 
bility of either one of the inequalities 

Q ^ 1 + e or Q ^ 1 - €. 

But this important and difficult problem has not yet been solved. 

6 . In order to illustrate the foregoing theoretical considerations we 
turn to experiments reported by Charlier in his book Vorlesungen 
liber die Grundztige der mathematischen Statistik'^ (Lund, 1920). He 
made 10,000 drawings of single cards from a complete deck of 52 cards 
(each card taken being returned before the next drawing), and noted 
the frequency of black cards. The drawings were divided into 1,000 
series of 10 cards, or into 200 series of 50 cards. The results are given 
in the tables on page 220. 

Assuming the independence of trials and the constant probability 
p =z the theoretical divergence coefficient must be 1. Let us compare 
it with the empirical divergence coefficient derived from Tables I and II. 
To this end we multiply the squares of numbers in the second column 
by the numbers given in the third column. The results are: 


For 200 series of 50 cards 
S(m* — ps)* = 2,487 


For 1,000 series of 10 cards 
S(m, — ps)* = 2,419 
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Table I.— Number of Black Cards in 
200 Groups of 50 Cards Each 


Frequency 

Difference 
m — 25 

Number of 
groups with 
these 

frequencies 

14 

-11 

1 

15 

-10 

0 

16 

- 9 

2 

17 

- 8 

2 

18 

- 7 

4 

19 

- 6 

8 

20 

- 5 

6 

21 

- 4 

15 

22 

- 3 

13 

23 

- 2 

15 

24 

- 1 

34 

25 

0 

14 

26 

1 

21 

27 

2 

26 

28 

3 

14 

29 

4 

10 

30 

5 

5 

31 

6 

5 

32 

7 

3 

33 

8 

2 


Table II.— Number of Black Cards in 
1,000 Groups of 10 Cards Each 


Frequency 

Difference 
m. — 5 

Number of 
groups with 
tlicse 

frequencies 

0 

-5 

3 

1 

-4 

10 

2 

-3 

43 

3 

-2 

116 

4 

-1 

221 

5 

0 

247 

6 

1 

202 

7 

2 

115 

8 

3 

34 

9 

4 

1 

10 

5 

1 

1 0 


Dividing these numbers by 10,000 • — 2,500, we get the following 

empirical divergence coefficients: 

D' = 0.9948; D" = 0.9676. 

Both are close to 1, so that the hypotheses of independence of trials 
and constant probability for each of them, arc in good agreement with 
empirical results. The second divergence coefficient, corresponding to 
more numerous groups, differs from 1 more than the first, corresponding 
to only 200 groups. But such a difference can be accounted for by 
fluctuations due to chance. 

Series of 50 trials are long enough to t(^st the theorem established in 
Sec. 2 of this chapter. The quantities denoted there by A and B are 
here correspondingly: 

^2 = A = 3.5263 

B = IH; B = 2.805 

whence 

^ = 1.2671 
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while 



1.2533. 


Again the difference, only about 4.10~^, is rather small. 

In this example, the probability of drawing a black card was assumed 
to be 32* know the probability, but suppose it to be 

constant throughout 10,000 independent trials, we must consider the 
coefficient 


n{N - 1) My 

^ {n - 1)M(N - ) 

8=^1 


In our example 


n = 1,000; N = 10,000; M = 4,933 
M 


To evaluate the sum 


s = 10; 6— = 4.933. 


1,000 

s = ^ (mi - 4.933)2 

t = l 


we write it in the form 
1,000 


»s = X X - 5) + 1,000 • (0.067 )* 


1-1 


1 = 1 


Now 


1,000 


2 (mi - 5)2 = 2,419 
1 

1,000 • (0.067)2 = 4.489 

1,000 

0.134 ^ (rtii - 5) = -8.978 


S = 2,414.51 


This is to be multiplied by the number 

n{N - 1) __ 1 

(n - 1)M{N - M) 2497.3' 

The result is 

0.9668, 

near enough to 1 for us to consider the hypothesis of independence of 
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trials and the constant value of probability as in agreement with experi¬ 
mental data. 


Examples of Dependent Trials 

6. So far we have dealt only with independent variables. But the 
law of large numbers holds, under certain conditions, even in the case of 
dependent variables. Leaving aside generalities, we shall show the appli¬ 
cation of the law of large numbers to a few interesting problems involving 
dependent variables. 

Let us consider first a Bernoullian series consisting of n + 1 inde¬ 
pendent trials with the same probability p for an event E, the opposite 
event being denoted by E. We associate with trials 1, 2, . . . n variables 
Xi, X 2 , . . . Xn defined as follows: 

Xi — Wi E occurs in trials z and i + 1, 

Xt = 0 in all other eases. 

The probability of Xi = 1 evidently is when nothing is known about 
the values of other variables. But if we know that Xt_i = 1, which 
imjdies the occurrence of E in the fth trial, then the probability of x^ = 1 
is p. Thus, consecutive variables are dependent. However, Xi and x* 
are independent if |fc — f| > 1, as we can easily see. Since 

Eix,) = E{xl) = p‘^ • 1 + (1 - p2) • 0 = p2 

the expectation of the sum xi + X 2 + ' * • + Xn will be 

E{xi + X2 + * * * + Xn) = np'\ 

As to the dispersion of this sum, it can be expressed as follows: 

n 

Bn = Xe(x, - pT + 2^E{x, - p^Xxi - p*). 

i = 1 j >i 

Now 

(8) E{xi - p2)2 = E{xl) - 2p^E(xi) + p^ = p2(l - p2) 
and 

(9) E{Xi - p^)ixj - p2) = E{Xi - p2) • Eixj - p2) = 0 
for j > i + 1 because then Xi and x,- are independent. But 

(10) Eixi - p2)(Xi+i - p^) = EixiXi+i) — p* = p^ - p* 
since the probability of simultaneous events 

Xi = 1 , Xi+i = 1 
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is Taking into account (8), (9), and (10), we find 


B„ = np^qiZp + 1) — 2p^q 


and the condition 


'm2 


0 


as 


is satisfied. Hence, the law of large numbers holds for variables Xi, 
X 2 , . . . x„. To express it in the simplest form, it suflSces to notice that 
the sum 

Xl + X2 + ‘ ^ + Xn 

represents the number of pairs EE occurring in consecutive trials of the 
Bernoullian series of n + 1 trials. Let us denote the frequency of such 
pairs by m. Then, referring to the law of large numbers, we get the 
following proposition: 

If in n consecutive pairs of Bernoullian trials the frequency of double 
successes EE is nij then the probahility of the inequality 

— — P^\ < € 

n ^ I 

will approach 1 as near as wc please, when n becomes sufficiently large. 

7. Simple chains of trials, described in Chap. V, Sec. 1, offer a good 
example of dependent trials to which the law of large numbers can be 
applied. Let p\ be the given probability of an event E in the first trial. 
According to the definition of a simple chain, the probability of E in 
any subsequent trial is a or according as E occurred or failed to occur 
in the preceding trial. By pn we denote the probability for E to occur 
in the nth trial when the results of other trials are unknown. Let 


d = a — 


V = 


0 


1 ~ 6 

Then, according to the developments in Chap. V, Sec. 2, 


Pn = P + (pi - p)5”~^ 


whence 


Pi + P2 + • * • + Ptt , pi - P 1 - 
- n - = P + .. 1 - 5’ 

barring the trivial cases 5 = lor5=—1. It follows that p represents 
the limit of the mean probability in n trials when n increases indefinitely, 
and for that reason p may be called the mean probability in an infinite 
chain of trials. When it is known that E has occurred in the ith trial, its 
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probability of occurring in some subsequent jth trial is given by 

p(t) = p 4- g = I - p. 

In the usual way we associate with trials 1, 2, 3, . . . variables 
Xiy X 2 y X 3 , . . . so that in general 


Xi — 1 when E occurs in the zth trial 

Xi = 0 when E fails to occur in the zth trial. 


Evidently 

E{x,) = E{x^) = Pi. 

In order to prove that the law of large numbers can be applied to 
variables xi, X 2 , x^, ^ we must have an idea of the behavior of Bn 

for large n. By definition 


B, = E{Xi - Pi + Xo - P2 + • • • + Xn - PnY = ^E{Xi - PiY + 

1 = 1 

+ 2%E[{xi - Tpdixi - Vi)]. 

j >i 

The first sum can easily be found. We have 
Eixi - PiY = Pi - pi = W + - p){p\ - - (pi “ 

whence 

n 

A = '^E{Xi - pi)^<-^ npq 

1 = 1 

neglecting terms which remain bounded. As to the second sum, we 
observe first that 


E{xi pOixj pj) E{xiXj) PiPi' 

Again, since the probability of 

XiXj — 1 


is evidently pip^}^ we have 

E{xiXi) = pip^\ 

and 


EiXi - pi){xi - Pi) = pi^pf - Pi) = -pgS'-'' + 

+ (Pi — p)(^ - p)^~^ - (pi - 

Now, for a fixed ^ = 1, 2, . . . n — 1, we must take the sum of these 
expressions letting j run over z + 1, i + 2, . . . n. The result of this 
summation is 


A An—i+1 Xi Xn Xi-1 An—1 

pg—J—J-1- (pi - p)(q - p)-}—- (pi - p)H' 


1 - S 
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Taking i^=l, 2, 3, . . .n — l and neglecting in the sum the terms 
which remain bounded, we get 

B = "^EiXi - Pi)(xj - Pi) ~ npq—I— 

j>i ^ " 

whence 

Bn = A + 2B ^ npq \^ Y 
This asymptotic equality suffices to show that 

Bn f. 

—> 0 as n CO ^ 

nr 

Therefore the law of large numbers can be applied to variables Xi, 
X 2 , a: 3 , .... Since the sum 

Xx + X. + - * ' + Xn ^ m 

represents the frequency of in n trials, the law of large numbers in 
this particular case can be stated as follows: For a fixed € > 0, no matter 
how small, the probability of the inequality 


1 + P2 + * * 

• + Pn 

n 



tends to 1 as n —» oo. 
The arithmetic mean 


P1 + V 2 + ' ^ + Vn 

n 

itself approaches the limit j). It is easy then to express the preceding 
theorem thus: The prohability of the inequality 


tends to 1 as n-^ oo. 

This proposition is of exactly the same type as Bernoulli's theorem, 
but applies to series of dependent trials. 

8 . Let a simple chain oi N = ns trials be divided into n consecutive 
series each consisting of 5 trials; also, let mj, m 2 , . . . mn be the fre¬ 
quencies of E in each of these series. When AT is a large number, the 
mean probability in N trials differs little from the quantity denoted by p. 
It is natural to modify the definition of the divergence coefficient given 
in Sec. 3 by taking p instead of the variable mean probability in N trials. 
Thus we define 
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^ {nii - spy 

D = - 

Npq 

In our case, the variables 

Xi = (mi — spy, Xi = (m2 — spy, • • • X„ = (m„ — spy 

are neither identical nor independent, although the degree of dependence 
is evidently very slight. These variable,s can also be presented in the 
form 

( 11 ) (Xa - P + Xa+x - P + ■ ■ ■ + X„+s.-l - py 

taking successively a = 1, s + 1, 2s + 1, . . . {11 - l).s + 1. 

To find the mathematical expectation of (11) it .suffices to notice that 

E(xi — py = E(xi - pxy + (pi — py = pq + {q - p){pi - p)&' ' 
E{xi - p){xi - p) = E(xi - p.)(xi - Pi) + {pi - p){pi - p) 

= pq5’-‘ + (pi - p)(q - p)h>- ' 

and then proceed exactly as in the approximate evaluation of Bn in Sec. 7. 
The final result is 


E{Xa — p + Xa+l — p + • ■ ■ 
^1 + 5 2pqS 


= spq 


+ 


1-5 
2pq 


(1 - a) 


(1 - By 
(9 


+ Xa+.-l - p)- = 

(9 - P)(/-'i - P)(l + 5) ^^_, 


+ 


T,a‘ 


- p)(pi - 

(1 - 5)2 


(1 - 5)2 


t5) -f“ 1 4“ 


For somewhat larg(‘ s th(‘ two last terms in the right member an' (‘om- 
pletely negligible; so is the third term if a ^ 5 + 1. Hence, with a good 
approximation, 


and 


E{Xx) = SP9^ 

E{X,) = «P9j4l 


2p95 , 

(1 - 6)2 
2pqd 

(1 - sy 


(9 - P)(Pi - P)(l + 5) 
IT-6)2 


if i > 1 


n _1 + S 25 I (9 - P)(Pi - P)(l + 5) 

1-5 5(1 -5)2'^" iVp7(l -T)2" 

Again, when N is large, the last term can be dropped and as a good 
approximation to D we can take 


( 12 ) 


r> - 1 + ^ 25 

1-6 s(l - 5)2' 


It can be shown that the law of largo numbers holds for variables Xi, 
Xi, . . . Xn and therefore when n (or the number of series) is large, the 
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empirical divergence coefficient is not likely to differ considerably from 
D as given by the above approximate formula. 

9 . In order to see how far the theory of simple chains agrees with 
actual experiments, the author of this book himself has done extensive 
experimental work. To form a chain of trials, one can take two sets of 
cards containing red and black cards in different proportions, and 
proceed to draw one card at a time (returning it to the pack in which it 
belongs after each drawing) accordijig to the following rules: At the 
outset one card is taken from a pack which we shall call the first set; 
then, whenever a red card is drawn, the next card is taken from the first 
set; but after a black card, the next one is taken from the second set. 
Evidently, these rules completely determine a series of trials possessing 
properties of a simple chsiin. In the first experiment the first pack 
contained 10 red and 10 black cards, while the second pack contained 5 
red and 15 black cards. Altogether, 10,000 drawings were made, and 
following their natural order, they were divided into 400 series of 25 
drawings each. The results are given in Table III. 

Table III.— Distiubution op Red Cards in 400 Series of 25 Cards 


Frequency of 
red cards, nt 

Difference, 

VI — 8 

NumVjer of series 
with these frequencies 

] 

-7 

2 

2 

-6 

4 

3 

-5 

8 

4 

-4 

27 

5 

-3 

29 

G 

-2 

54 

7 

-1 

37 

8 

0 

62 

9 

1 

47 

10 

2 

44 

11 

3 

41 

12 

4 

20 

13 

5 

20 

14 

6 

7 

16 

7 

4 

16 

8 

3 

17 

9 

1 


The sum of the numbers in column 3 is 400, as it should be. Taking 
the sum of the products of numbers in columns 1 and 3, we get 3,323, which 
is the total number of red cards. The relative frequency of red cards in 
10,000 trials is, therefore. 


0.3323. 
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In our case 

a = h = h 5 = 1 

and the mean probability p in an infinite series of trials 

" - i4l - I - "-ssss 

Thus, the relative frequency observed differs from p only by 10~^ and 
in this respect the agreement between theory and experiment is very 
satisfactory. Now let us consider the theoretical divergence coefficient 
for which we have the approximate expression 

1 + 6 25 

1-5 s{l -'5)2* 

Here we must substitute 8 = 34 and s = 25. The result is 
D = 1.631, approximately. 

To find the empirical divergence coefficient we must first evaluate the 
sum 

S = 2:(m - 

extended over all 400 series. For the sake of easier calculation, wt 
present S thus: 

S = S(m - 8)2 - lX(m - 8) + 

Now from Table III we get 

- 8)2 = 3,521; 2{m - 8) = 123 

whence 

S = 3,483.4. 

Dividing this number by 2000 ^/ = 2,222.2, we find the empirical 
divergence coefficient 

D' = 1.568 

which differs from D = 1.631 by only about 0.06, well within reasonable 
limits. 

10 . In two other experiments two packs were used: one containing 
13 red and 7 black cards, and another 7 red and 13 black cards. In 
one experiment the pack with 13 red cards was considered as the first 
deck, and in the other experiment it became the second deck. The 
new experiments were conducted in the same way as that described in 
Sec. 9, but they were both carried to 20,000 trials divided into 1,000 
series of 20 trials each. In the first experiment, we have 

« = ^ “ A; ^ P = ^ 


and 


D = 1.796, approximately, 
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while the same quantities for the second experiment are 

« -A? ^ ^ = "“A; P = i 

and 

D == 0.556, approximately. 

The results of these experiments are recorded in the following two 
tables: 


Table IV.— Concerning the First Experiment 


Frequency of 

Difference, 

Number of scries 

red cards, m 

in — 10 

with those frcujuencies 

2 

-8 

3 

3 

-7 

5 

4 

-0 

18 

5 

-5 

36 

6 

-4 

59 

7 

-3 

93 

8 

-2 

103 

9 

-1 

117 

10 

0 

128 

n 

1 

121 

12 

2 

101 

13 

3 

93 

14 

4 

48 

15 

5 

39 

Hi 

0 

26 

17 

7 

7 

18 

8 

1 

19 

9 

1 

20 

10 

1 

TvVBLe V.— Concerning the 

Second Experiment 

Frequcn(\y of 

Difference, 

Number of series 

rod cards, vi 

m - 10 

with these frequencies 

5 

-5 

2 

6 

-4 

10 

7 

-3 

48 

8 

-2 

112 

9 

-1 

193 

10 

0 

251 

11 

1 

201 

12 

2 

113 

13 

3 

56 

14 

4 

9 

15 

5 

5 
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Taking the sum of the products of numbers in columns 1 and 3, we 

find 

10,036 and 10,045 

as the total number of red cards in the first and second experiments. 
Dividing these numbers by 20,000, we have the following relative 
frequencies of red cards: 

0.50018 and 0.500225 

extremely near to p = 0.5. From the first table we find that 

- 10)2 - 3^924 

summation being extended over all 1,000 seri(\s. Dividing this number 
by 20,000 • 34 = 5,000, we find the empirical divergence coefficient in 
the first experiment 

D' = 1.785 

which comes close to 

D = 1.796. 

Likewise, from the second table we find 

Z(m - 10)2 = 2,709, 
whence, dividing by 5,000, 

/)" = 0.5418 

again close to 

D = 0.5562. 

Thus, all the essential circumstances foreseen theoretically, for simple 
chains of trials, are in excellent agreement with our experiments. 


Problems for Solution 

1 . From an urn originally containing a white and h black balls, n balls are drawn 
in succession, each ball drawn being replaced by 1 -b c(c >0) balls of the same color 
before the next drawing. If m is the frequency of white balls, show that the prob¬ 
ability of the inequality 

m a ^ 
n a A- h 

does not tend to 1 as n increases indefinitely (Markoff, G. P61ya). 

Indication of the Proof. If = 1 or Xi = 0, according as a white or a black ball 
appears in the ith drawing, we have 
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Hence 


B„ = 


Xl + X2 + 


+ Xn 


na V 
a +b) 


n^abc 


{a + b)^(a b c) 
-h 


4- 


nab 


{a + b){a + b c) 


2. Marhes Problem. A group of exactly m iin interrupt eel su(*( 5 esscs E or failures F 
in a Bernoullian series of trials with the probability p for a succi'ss is called an “m 
s(;quence.” If N is the frequency of m sequences in n trials, show that the probability 
of the inequality 


n 


(jpmqi p^q*r^) 


< t 


for a fixed e converges to 1 as n becomes infinite. 

Indication of the Proof. Assoiaate with each of thh /z = n — ra + 1 first trials 
variables Xi, X 2 , . . . assuming onl>' two values, 0 and 1. For 1 < i < jx we set 
= 1 if, beginning with the fth trial, a succession of m letters A* orF is preceded and 
followed by F or E. In all other cases =•■ 0. We set Xi = 1 if, beginning with the 
first trial, there is a succaission of m letters E or F ended by F or E\ otherwise xj = 0. 
Finally, x^, ^ I if, beginning with the ^th trial there is a succession of rn letters E or F 
preceded by F or E^ otherwise = 0. Show that 


E{X], + X 2 + • • ' -f- .r^) = in — m — lUp^'f/^ + -h 2(p^g + pq^*) 

E{x\ 4- X2 -\- • • • 4- -f i- iiP 


where P remains bounded. 

3. The following interesting series of dependent trials has been suggested by S, 
Bernstein; Twa) urns (amtain white and black balls. The probabilities of drawing 
white balls from the first and second urns are, respectively, p and //. The probabilities 
of drawing Idack balls from the same urns arc <7 = 1— p and q' ~ \ — p'. Finally, 
the proViability of taking a ball from the first urn at the outset of the trials is a. A 
series of trials is uniquely defined by the following rule: Whenever a white ball is 
drawn (and returned), the next ball is drawm from the same urn; but when a black 
ball is drawm, the next ball is taken from the other urn. Let be the probability 
that the nth ball will be drawm from the first urn when the results of other drawings 
remain unknown. Under the same assumption, let pn be the probability of the nth 
ball being wdiite. Find general t^xpressions of a„ and pn^ 

Hint: 

anfl — «n(p 4- // — 1) + 1 — p' 

whence 


Also 

whence 


Otn 



4' 


2 




(p 4- v' 


l)»-h 


Pn = OLnP 4- (1 — «n)p' 


Pn = 


p 4- p' - 2pp^ 

2 - p - p' 


+ 1 


2 



(p - p')(p' 4- p 1)"“^ 


4. When it becomes known that in the ith trial a white ball was drawn, what are 
the probabilities and pj*^ of taking a ball from the first urn in the>th(^' > i) trial 
and of drawing a white ball in the same trial? 
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Hint: The probability that it was the first urn from which a white ball was 
drawn in the ith trial is determined by Bayes' formula; 


For n ^ i + 1 
whence 


Vi 


«i+l = + p' - 1) + 1 - p' 


„(.) = ^- + (^l _ J—^ ,)(p + p' - 

i - p - p \ Pi 2 - p - p/ 


1 ) 


i-i-1 


for j > i -b 1. Furthermore 

+ (1 — OL^/^)p' 

for y ^1 + 1- 

6. From now on we shall assume p + p' = 1 or p' = q, q' = p. Show that the 
law of large numbers can be applied to variables Xi, X 2 , . . . which are defined in 

the usual way: 


Xi = 1 if a white ball is drawn in the fth trial, 
Xi = 0 if a black ball is drawn in the fth trial. 


Indication of the Proof. Evidently E{x^) = E{xf) = p,. Furthermore 


Bn = '^E(xi - piY + 2^EiXi - pi)(,Xi - Pi). 
i=l J>i 

Now 

E{Xi - pt)2 = 2pq{l - 2pq); i> I 
E{Xi - pi)2 = pq -h a(l - a)(p — q)^. 

For j > t > 1 

E{x, - pi){xj - p,) =0 if j > i + 1 

E{Xi - pi)(Xi+i - = pq{l - Apq). 

For i — I and j > \ 

E{xi — pOixj — Pi) =0 if j > 2 
E{xi pi){x 2 - P 2 ) = etp^ + (1 - oi)q^ ~ (1 - 2pq){q -f (p - g)a). 


Hence 


Bn ^ ^pq{l — 3pg)n 


and the law of large numbers holds. It can be stated as follows: If in n trials the 
frequency of white balls is m, then the probability of the inequality 


m 

n 


(p2 + q^ 


< c 


tends to 1 as n tends to infinity for any given positive number e. 

6 . Let r = p2 + 5-2 the mean probability in infinitely many trials, 
divergence coefficient 


D 


n 

^ (mi - sr)^ 
Nr(l - r) 


Find the 


when N = rw trials are divided in n consecutive groups containing 8 trials each. 
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Indication of Solution. From the foregoing formulas it follows that 
E{:xa - r + Xa+i - r + • • • + Xa+._i ~ r )2 == ^spq{l - 3pq) - 2pq{l - 4pg) 
if a > 1. Hence 


n 

E^, {mi — sr )2 = 4Npq{l — 3pq) ~ 4spq{i — 3pq) — 2{n — l)pq{l — 4pq). 
i^2 
Again 

E{mi — sr )2 = 4sp^(l — Zpq) — 2pq{3 — lO/jg) -f- ;>(1 — 69 + 12 ^^ — 4 ^^) _ 

~ ol{p - q){l - 8 p 5 ) 

so that finally 


jy ^ ^ _1 - , (p - 9)(?> - a)(l - 8pg) 

1 — 2pq ,s*(l — 2pq) 2Npq{\ — 2pq) 

For large N with a good approximation 

1 -- 2pq s(l — 2pq) 

7, Two sets of cards containing respectively 12 red and 4 black cards (the first 
decik) and 4 red and 12 black cards (the second deck) were used in the following experi- 
mentThe first card was taken from the first deck, and in the following trials, after 
a red card the next one was taken from the same deck, but after a black one the next 
card was taken from the other deck. Altogether 25,000 cards were drawn, and in their 
natural order were divided in 1,000 series of 25 cards each. The results are recorded 
in Table VI. How close is the agreement between this experiment and the theory? 


Table VI.— Distribution of Red Cards in 1,000 Series of 25 Cards 


Fn'qiKUH^y of 
red cards, m 

Difference, 
m — 16 

Number of series 
with these frequencies 

6 

-10 

1 

7 

- 9 

1 

8 

- 8 

1 

9 

- 7 

12 

10 

- 6 

13 

11 

- 5 

43 

12 

- 4 

65 

13 

- 3 

92 

14 

- 2 

101 

15 

- 1 

162 

16 

0 

94 

17 

1 

164 

18 

2 

68 

19 

3 

110 

20 

4 

26 

21 

5 

28 

22 

6 

10 

23 

7 

7 

24 

8 

1 

25 

9 

1 
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Ans. In the present case p = g' = %, p' = g = 3 ^ 4 . Mean probability in infinitely 
many trials: 

p2 ^ ^2 == j ^ 0.625. 

Theoretical divergence eoefficient: D — 1.384, Frequency of red cards: 15,696. 
Relative frequency: 

hUU = 0.62784, 

close to 0.625. 

Empirical divergence coefficient: D' — 1.3845, very close to 1.384. 

The probability of taking a card from the second deck is 0.25. Now, by actual 
counting, it was found that in 7,500 trials a card was taken from the second deck 
1,856 times. Hence, the relative frequency of this event in 7,500 trials is 

HU == 0.2475, 

again very close to 0.25. 
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CHAPTER XII 


PROBABILITIES IN CONTINUUM 

1. In the preceding parts of this hook, whenever we dealt with 
stochastic variables, it was understood that their range of variation was 
rei)resented by a finite s(d. of numbers. Although, for the sake of better 
understanding of the subject, it w^as natural to begin with this simplest 
case, there are many reasons why it is necessarj^ to introduce into th(' 
calculus of probability stochastic variables with infinitely many values. 
Such variabl(\s present themselves naturally in maii}^ cases of the typ(' of 
Buffon^s needle problem which we had occasion to mention in Cha}). VI. 

On the other hand, even in dealing with stochastic variables wdth a 
finite, but very large number of values, it is often profitable for the sake 
of approximate evaluations, to substitute for them fictitious ^^ariables 
with infinitely many values. Among these the most important ones by 
far are continuous variables. 

Case of One Variable 

2. Beginning with the case of a single continuons variabk^ .r, w^e must 
assume that its range of variation is knowm and represented by a givt'ii 
interval (a, 5), finite or infinite. The knowd('dge only of the range of 
variation of x wwild not enable us to (‘onsider .r as a sto(‘liastic variabl(‘; 
to be able to do so, wo must introduc(‘ in some form or other the considera¬ 
tions of probability. For a continuous variable it is as unnatural to 
speak of the probability of any selected single value, as it is to speak of 
the dimension of a single selected point on a line. But just as wo speak 
of the length of a segment of a line, we may introduce the notion of the 
probability that x wdll be confined to a given interval (c, rf), part of (a, 6). 

In introducing this new notion of probability in any manner w hatso- 
ever, we must be careful not to fall into contradiction with the laws of 
probability w^hich are assumed as fundamental. To this end, if P (c, d) 
is the probability for x to lie in the interval (c, d), we are led to assume 

V Pic, d) ^ 0 

2° P(a, b) = 1. 

The first assumption is an expression of the fact that probability 
can never be negative. The second assumption corresponds to the fact 
that X certainly assumes one out of the totality of its possible values. 

• 235 
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Next, if the interval (c, d) is divided into two adjoining intervals 
(c, e) and (e, d), we assume 

3° P(c, d) = P(c, e) + P(c, d) 

in conformity with the theorem of total probability. 

For continuous variables it is furthermore assumed: 4® for an infini¬ 
tesimal interval (c, d), P(c, d) is also infinitesimal. 

Properties 3° and 4® show that P(c, d) is a continuous function of c 
and d and that 

P(c, c) — 0. 

In other w'ords, the probability that x will assume any given value is 0. 
At the same time P(c, d) represents the probability of any one of the four 
inequalities 

c < X < d; L S X < d; c < x ^ d; c ^ x ^ d. 

3. A simple example will serve to clarify these general considerations. 
A small ball of negligible dimensions is made to move on the rim of a 
circular disk. It is set in motion by a vehement impulse and after many 
complete revolutions, retarded by friction and the resistance of the air, 
comes to rest. The variety and complexity of causes influencing the 
motion of the ball make it impossible to foresee the final position of the 
ball when it comes to rest and the whole phenomenon bears characteristic 
features of a play of chance. The stochastic variable associated with this 
chance phenomenon is the distance from a certain definite point on the 
rim (origin) to the final position of the ball, counted in a definite direction, 
for example, clockwise. This variable, when we consider the ball as a 
mere point, may have any value between 0 and the length of the rim. 
The question now arises, how to define the probability that the ball will 
stop in a specified portion of the rim, or else that the variable we consider 
will have a value belonging to a definite interval, part of its total range 
of variation. In trying to define this probability, we must observe the 
fundamental requirements set forth in Sec. 2. Besides that, we must of 
necessity resort to considerations which are not mathematical in their 
nature but are based partly on aprioristic and partly on experimental 
grounds. Suppose we take two equal arcs on the rim. There is nothing 
perceptible a priori that would make the ball stop in one arc rather than 
in another. Besides, actual experiments show that the ball stops in one 
arc approximately the same number of times as in another, and this 
experimental knowledge together with aprioristic considerations suggests 
the assumption that we must attribute equal probabilities to equal arcs, 
irrespective of the position of the arcs on the rim. As soon as we agree on 
this assumption or hypothesis, the problem becomes mathematical and 
can easily be solved. 
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Before proceeding to the solution, a remark on the meaning of zero 
probability in connection with continuous variables is not out of place. 
Zero probability in this case does not mean logical impossibility. We 
attribute zero probability to the event that the ball will stop precisely 
at the origin. However, that possibility is not altogether excluded 
so far as we consider the origin and the ball as mere points. The question 
lacks s(uise if we deal with a material ball and a material rim, no matter 
how small the former and how fine the latt(‘r. 

4. A stochastic variable is said to have uniform distribution of 
probability if probabilities attached to two equal intervals are equal. 
This means that P(c, d) depends only upon the length d — c = s oi the 
interval (c, d) and accordingly can be denoted simply by P{s), Com¬ 
bining two adjoining intervals of the respective lengths s and s' into a 
single interval of length s + s', according to requirement 3°, we must 
have 

(1) P{s + s') ^P(s) +P(s'). 

Suppose now that the interval (a, b) of the length b — a = I, represent¬ 
ing the whole range of variation of x, is divided into n equal intervals 
of the length l/n. The repeated application of equation (1) gives 

PW - „p(l). 

But by requirement 2° P{1) — 1 and hence 



Again, repeated application of (1) gives 

p{viA = ^ 

\n ) n 

for any integer m < n. Now let us take any interval of length s. For an 

1X1' 

appropriate m it will contain the interval —I and be contained in the 
interval - 7; hence, referring to requirements 1° and 3°, we shall have 


while 


^ ^ P(,) ^ !?L±1 
n ~ ^ — n 


m, . m + I 

—l^s< - 1 , 

n n 


n 
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m ^ s ^ m + 1 
n 7 n 


Since P{s) and s/l are contained in the same interval of length 1/n, 


P(s) - J 


<1 


n 


and this being true for an arbitrary ??, no matter how large, it follows that 


p(«) = f 

Thus for a variable x with uniform distribution of probability, the 
probability of assuming a value belonging to an interval of length s is 
giv(ui by the ratio of s to ihi) length I of the whole range of variation of x. 

5. In the general case, when we cannot assume the uniform distribu¬ 
tion of probability throughout the whole range of wariatiou of Xy w(' let 
ourselves be guided by an analogy with a mass distributed continuously 
ov(‘r a line. In fact, the distribution of a mass satisfi('s all the n'quire- 
ments set forth for probability^ In particular, the mass Am contained 
in an inQnitesirnal interval {Zy z + Az) is also infinitesimal and the mean 
density 

Am 

Az 

is generally supposed to tend, with Az converging to 0, to a liuiit called 
‘Sl(*nsity at the point 2 .” If this density p{z) is known, the mass con¬ 
tained in any interval (c, d) is represented by an integral 

J%(z)dz. 

Following this analogy we admit that the mean density of probability 

P{Zy z + Az) 

Az 

tends to a limit/(z); density of probability at the point z when the length 
of the interval Az tends to 0. Hence, again the probability corresponding 
to an interval (c, d) will be represented by the integral 

P(c, d) = Cf{z)dz. 

C 

This expression satisfies all the requirements of Sec. 2 if the density of 
the probability/( 2 ) is subject to two conditions: 
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(a) 

/(z) ^ 0 for all z in (a, b). 


(b) 

rf(z)dz = 1. 

•/a 



The second condition implies, of course, the existence of the integral itself. 
But in all cases of any importance the density is continuous, save for 
discontinuities of the simplest kind which do not cause any doubts as 
to the existence of the above integral. 

From the general expression of P(c, d) it follows that for an infini¬ 
tesimal interval ( 2 , z + d.z) the probability is given by/( 2 :)f/ 2 : neglecting 
infinitesimals of a higher order. For the uniform distribution of proba¬ 
bility over an interval of length I the density is constant and = 1//. 

In other cases we cannot expect to ol)tain a d(^finit(' ex])ression for 
density unless the variable itself is sufficicaitly characterized by addi¬ 
tional conditions, either hypothetical or implied l)y the problem. Thus, 
for instance, in applications of probability to probkuns of theoretical 
physics, the physicists have succeeded in obtaining definite probability 
distributions by invoking physical laws of admitted universal validity 
together with some plausible hypotheses. 

6 . The interval containing all possible values of a stochastic variable 
may be finite or infinite according to the nature of that v ariable. How¬ 
ever, in all cases we may take the largest possible int(n*val from — co to 
+ ^ ; to this end it suffic^es to define the density outside of th(' originally 
given interval as being = 0. Then the density will be defined for all 
r(‘al values of z and will satisfy the conditions: 

(a) f(z) ^ 0 for all 

(5) J‘_‘j{z)dz = 1 

Furthermore, the probability for x to be in any interval (c, d) will be 
given by 

f%)dz. 

In particular, taking c = — 00 and writing t instead of d, 

F{t) = f‘ J{z)dz 

represents the probability that x will not exceed or will be less than t. 
Considered as a function of tj F{t) is never decreasing and varies between 
00 ) = 0 and F(+ 00 ) = 1. It is called the ‘^distribution function of 
probability.^^ In case x has uniform distribution of probability ov(‘r an 
interval (a, h) its distribution function is evidently defined as follows: 
F{t) =0 for t < a 

prA = i -for a ^ t ^ b 

^' b — a 

F(t) = 1 for t>b. 

Its graph is shown in Fig. 1 on page 240. 
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7. The definition of mathematical expectation can easily be extended 
to continuous variables; namely, the expectation of x or the mean value 
of X is defined by 

E{x) = f“jf(z)dz 

provided this integral exists. Similarly, the mathematical expectation 
of any function <p(x) is given by 

Of course, the existence of the integral in the right member is presupposed 
again. When this integral does not exist, it is meaningless to spe^ak of 

the mathematical expectation of (p{x). 

__The mathematical expectation of th(i 

a b + 0 O power x” with positive integer exponent 

is called the moment of the order n or 
nth moment. We shall denote it by so that 

mn = z^f{z)dz. 

The dispersion D and the standard deviation of x are defined in the same 
way as in Chap. IX; namely, 

D = = E(x — mj)- — J* mi)-J{z)dz = — m\. 

Often it is advisable to consider the mathematical expectation of |x|“ 
where a may be any real number, ordinarily positive. This expcH’tation 
is called the ‘^absolute moment of the order Its expression is 

= f“jz\°f{z)dz, 

and it is evident that 

m^k = |w2fc+i| ^ n>2k+\- 

The mathematical expectation of the function 

^itx 

where t is a real variable, is of the utmost importance. It is called the 
‘^characteristic function^’ of distribution and is defined by 

<p{t) = J_ ^e^^^f{z)dz. 


Since f{z) ^ 0 and 


r * f{z)dz = 1 
00 
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the integral defining ip it) is always convergent and 

kwi ^ 1. 


The distribution is completely determined by its characteristic func¬ 
tion. Because by the Fourier theorem 

^ J_ Jit^_ = fix) 

at all points of continuity of f(x). But the left-hand member is 

by the definition of ip{t) and so 





8 . To illustrate the preceding general explanations we shall now con¬ 
sider a few examples. 


Example 1. Let x be a variable with uniform distribution of probability over 
the interval (0, 1). The density of this distribution being constant 


the mean value of x is 


and the second moment 



mi 


m2 



2 


P 

3 


Hence, the square of the standard deviation 


m2 — ml 


12’ 


This simple example may be used to illust rate a remark made at the beginning of this 
chapter, that sometimes it is profitable to substitute for a variable with a finite but 
large number of values a fictitious cont inuous variable. Sui)pose that in flipping a coin 
n times, we mark heads by 1 and tails by 0, thus obtaining a sequence comprising n 
units and zeros altogether, disposed in the order of trials. This sequence may be con¬ 
sidered as successive digits in the binary rei^resentation of a fraction: 




+ 


OCn 

¥ 


contained between 0 and 1. X may be considered as a stochastic variable with 2” 
values each having the probability 1 /2”. The probability n (a, /3) that X will be con¬ 
tained in the interval (a, /3), or more definitely that X wdll satisfy the inequalities 

a < X S ^ 
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is obviously obtained by multiplying the number of integers N contained in the limits 

2^ce < N <. 


by 1 /2”. Now there are exactly 

|2«/3] - [2”a] - 2»(/3 - a) +6; -I < 9 < I 
such integers; hence 

n(a, ^) = /3 - a -h 

If n is even moderately large, this probability is very near to the probability 

P(a, |8) = ^ — a 

that a fictitious variable x with uniform distribution over the interval (0, 1) will 
assume a value in the interval (a, (i). The first two moments of the variable X are, 
respectively 


Ml - 


M 2 


0 + 14 - 2 +- — + 2^-1 __ 1 _ 1 _ 

22 r» " 2 2"+l 

+ 1^ + 2^ + — • + (2« - 1)^ _ 1 _ 
2Sn ■" 


1 


+ 


1 


2 n+l ' 3 . 


and differ little from the respective moments and of the fictitious continuous 
variable. Without losing anything essential, we here gain considerably in sim¬ 
plicity by substituting a fictitious continuous variable for the discontinuous variable 

X. 

Example^. A thin bar can rotate freely about its middle point P. It is set in 
motion and after several revolutions comes to a stop pointing toward a point X on a 
line 1. The position of the bar is determined by an angle 9 
formed by itseK and the perpendicular PO dropped from P on 1; 9 
varies between the limits — 7r/2 and 12 and its distribution is 
supposed to be uniform. The position of A" is determined by 
its distance OX = x from 0, this distance being positive or nega¬ 
tive according as X is to the right or to the left of the point O. 
It is required to find the distribution of the probability of x. The relation between 9 
and X is 


X 


r^\ 

0 X 

Fig. 2. 


X = a tg 9 


if OP = a or, conversely, 


9 — arc tg — 
a 

By differentiation we find the relation between d9 and dx: 

adx 

d9 =- 

Now, by hypothesis, the probability that <OPA' will be contained between 9 and 
0 4* d0 is 

d9 _ I adx 

TT IT a* + X® 
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And the probability that the distance of X from 0 will be contained between x and 
X -h da; is the same. Hence, the density of probability for the variable x is 


TT 

and the probability corresponding to a finite interval (c, d) is given by 


P(c, d) 


For the whole range of variation of x 


1 r • 


- oo(P + 

as it should be. However, w(‘ cannot speak of the mean value of x or of moment? ol 
higher order, since the integrals 


xdx 

c(C + x^ 


r“ xHx 
00®“ "h X‘ 


have no m(;aning. But th(* characteristic function ip(t) exists and is given by 


« r*’ 

irj-. Qoa^ -f- 


Example 3. One of the most important distributions (theoretically and prac¬ 
tically) is the so-called “Gaussian” or “norrnar’ distribution. The density of this 
distribution is given by 

f{z) = 

with three parameters K, h, a. However, only twa> of thcise parameters are inde¬ 
pendent, since we must have 

I f{,z)dz^K\ = A' I = 1; 

whence 


and finally 


To find the meaning of a and h we observe that the mean value of our variable is 




(2 - 4 - ' 




^ * (2 — ^ = 0. 
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Thus a has the meaning of the mean value of the normally distributed variable x. 
The square of the standard deviation is given by 


whence 




- aydz = 




/: 




J. 


h = 


1 


o \/2 


Thus for the normally distributed variable with the mean a and standard deviation tr 
the density of probability is 







Finally, for the variable u = x — a with the mean value 0 and the same standard 
deviation, the expression of density takes the simplest form 





2a* 


and the distribution function of probability is represented by the integral 


The curve of density 


F(0 


= _L_r 


'"dz. 



X- 


or the probability curve has a bell-shaped form as shown in the figure corresponding 

to <T = 1. It has a single maximum corre¬ 
sponding to a; = 0 and on both sides of this 
maximum it rapidly approaches the x axis. 

The characteristic function of normal 
distribution has a very simple form. By 

definition 



O 

Fig. 3. 


But as 



<p(t) 


_ ^ r" - 
00 


cos fSxdx 






(a > 0) 


we find that 


tp{t) == e ^ . 


The moments of normal distribution (with the mean = 0) can now be easily found. 
From the definition of the characteristic function it follows that 


i^rrin 



-0 
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Sec. 9] 


In our case 


<p{t) 


<r2 ^ , 1 /<rA2^ 1 /<r2\3 ^ 

2* 1-2- 3 ( 2 }* 


whence 


( 


d2‘+V(0\ 

A-0 




Thus 


= 0 

m2k = 1 • 3 • 5 • • • (2fc — l)(r2*. 


Case of Two or More Variables 
9. By analogy it is easy now to extend th(' notion of probability to 
two or more variables considered simultaneously. A pair of special 
values Xj y of two stochastic variables X, Y will be represented geomet¬ 
rically by a point with the coordinates x, y referred to a rectangular 
system of axes. Tlie domain S of all the possible values of X and Y will 
be represented by a portion (finite or infinite) of a plane with a definite 
boundary unless this domain coincides with the whole plane. The 
probability that the point ar, y should belong to an infinitesimal area 
dxdy will be expressed by the product <p{Xy y) dxdy where the function 
(p{x^ y) is again called the density of probability at the point Xj ?/. The 
density of probability must satisfy two requirenumts: it is non-negative 
in the whole domain S and 

J* y)dxdy = 1 

where the double integral is extended over all the domain S. The 
probability for the point x, y to be located in a given domain o- is then 
given by the integral 

J*/ v{x, y)dxdy 

a 

extended over cr. 

If (p{x^ y) is a constant in the distribution of probability is called 
uniform. The domain aS in this case must be finite and if its area is 
denoted by the same letter, then 

y) = I* 

The probability for the point z, y to be within the domain a will be given 
by the ratio 

O’ 

S 

denoting the area of the domain o’ by o- again. 
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10. Wc can always substitute the whole plane for the domain 8. 
To that end it suffices to set 


‘Pi.x, y) = 0 

in all points not belonging to 8. We shall then have 

>p{x, y) ^ 0 


everywhere and 

/ “ /“ y)dxdy = 1. 

J— coj— 00 

By doing so we have the advantage of stating results in a perfectly general 
form without mentioning the domain S. However, in dealing with 
particular problems, it is more convenient to consider only those points 
which can actually represent simultaneous values of the variables. 
The probability of simultaneous inequalities 

a < X < b; c < y < d 

according to the general definition is represented by the double integral 

P y)dxdy. 

Ja Jc 

This corresponds to the compound probability of two events and we must 
see that the fundamental theorem of compound prol)al)ilities continues 
to hold. Taking c== —oo,rf = +oo the repeated integral 

y)dy 

represents the probability P(a, 6) for the variable X (as if it were con¬ 
sidered alone without any reference to Y) to have its value in (a, 6). 
The function 


f(^) = 

represents the density of probability of A^. Thus 

P{a, b) = £f{x)dx. 

In a similar way 

P(3J) = y)dx 

represents the density of the probability of Y ; and the probability Q(c, d) 
that this variable has its value in (c, d) is given by 

Q{e, d) = £F{y)dy. 
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Now the double integral 

rr <p{x, y)dxdy 

can be written in either of the forms 


C Cip(x, y)dxdy = Cf{x)dx ■ rVi(2/)dt/ 

Ja J c Ja J c 

r y)dxdy = UF(y)dy ■ r’/,(x)dx 

Ja J c Jc Ja 


where 


f\{v) 


y)dx 

jy{x)dx ’ 


fj(x) 


ff<p(x, y)dy 

^y\y)dy 


may be considered as densities of conditional probabilities, respectively, 
for Y when it is known that X has a value in (a, h) and for X when it is 
known that Y has value in (e, d). The preceding exi)ressions for the 
probability of the simultaneous inequalities 

a < X < by c < y < d 

have the same form as the theorem of coinpound probability and may be 
considered as its extension. The conditional probability for Y to have 
its value in (c, d) when it is known that X has its value in (a, b) is given by 

jy^{y)dy. 

Now, we define variables X and Y as indepcaident when the proba¬ 
bility for Y to be in (r, d) is not affected by the knowledge that X belongs 
to (a, b)y which means that 

= fy{y)dy 

or 

p Cy>{x, y)dxdy = CF{y)dy ■ f'’f(x)dx 

Ja Jc Jc Ja 

and, since intervals (a, b) and (c, d) are arbitrary, 

•fix, y) = fix) ■ Fiy) 

at points of continuity. Hence, the density of probability for two 
independent variables is a product of a function of x alone by a function 
of y alone. Conversely, when this condition is satisfied the variables are 
independent. For independent variabkis the probability of the simul¬ 
taneous inequalities 

a < X < b 
c < y < d 
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has a simple expression 

jy{x)dx ■ £^F(y)dy 

which is the product of the probability for X to have its value in the 
interval (a, h) by the probability for Y to have its value in the interval 
(c, d), in perfect analogy with the compound probability of two inde¬ 
pendent events. 

Finally; the mathematical expectation of any function y) can be 
defined by 

y)) = U:. y)‘pix, y)dxdy 

provided the integral in the right member exists. 

11. It is hardly necessary to dwell at length upon the case of several 
stochastic variables. A system of particular values x^y . Xn oi 

n stochastic variables Xi, X 2 , . . . Xn may be considered as a point in 
n-dimensional space. The density of probability is a non-negative func¬ 
tion (p{x\y X 2 j . . > Xn) defined in the whole space and satisfying the 
condition 



. . . Xn)dXidX2 • • ' dXn — 1. 


The probability for a point representing A"i, A'o, . . . Xn to be located 
in a given domain <r is given by the integral 


// 


n ^2, 


. . . Xn)dXidX2 . . . dXn 


extended over a. In the case of uniform distribution of probability, 
<p{xiy X 2 y • • . Xn) is by definition a (constant in a certain finite region 
of space and =0 outside of that region. If V is the volume of that 
region and v the volume of the domain 0 -, the ratio v/V gives the proba¬ 
bility that a point belongs to c. 

The probability of the simultaneous inequalities 


«! < < bi] 0,2 < X 2 < ^ 2 ; . . . an < Xn < bn 


is given by the integral 



I, X 2 , . . . Xn)dXidX2 . . . dXn 


which, by introduction of the conditional probabilities as in the case of 
two variables, can be put into the form of a product of n integrals in a 
manner perfectly analogous to the expression of the probability of a 
compound event with n components. Finally, the variables are inde- 
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pendent if the density (p{x\j . Xn) is a product of n functions 

depending only upon Xi^ x^y . . . Xny respectively, and conversely. 

The expression 

\l/<pdxidx2 * * • dxn 
serves to define the mathematiciil expectation of any function yp{x\j 

X2y . . . Xn) of Xij X 2 , . . . Xn. 

12. Since in introducing the extended idea of probability we took 
care to preserve the fundamental theor(‘rns of the calculus of probability, 
we may be sure that other theorems derived from them will hold for 
continuous variabh^s. In particular, theorems concerning mathematical 
expectation and the fundamental lemma in Chaj). X, Sec. 1, hold for 
continuous variables. Ui)on this basis as we have s(‘en was built the 
proof of the law of larg(‘ numbers. Hence, this important theorem 
applies equally to continuous variable's. 


X2, • ■ • x„)] = 


Geometrical Problems 

13. A f('w geometrical j)robl(‘ms \N'il] afford a good illustration of the 
foregoing general principle's. 

Problem 1. A re'ctilinear se'gme'iit AB is divide'd by a point C into 
two parts AC = a, CB == b. Points A" and Y are , , , , i 

taken at random on AC jineKV^, r(\s])('ctiv(‘ly. What is ^ ^ ^ 

the probability that A A', X}\ BY can form a triangle? 

Solution. We must first agre'c upon the' me'aning of the expression 
^'at random.’^ The iele'a sugge'stenl by this expiTssiem implie's that the 
way of selecting i)oints A" and Y gives no preference to 
any j)e)int of AC anel CB^ respe'edive'ly. Consequently, 
variables x = AX and y — BY may be assumed to have 
unife)rm distributiein of proliability. The domain e)f the 
point X, y is a rectangle OMPN with the sides DM = a, 
O S M Q ON = h. In orde'r that A A", A"F, BY can form a triangle 

Fia. 5. following ine'qualities must be fulfilled: 

X < {a Y h ~~ X — y) + y or x < a Y h — x 

y < (a Y b - X - 2j) Y X or y < a Y b ~ y 

a Y b — X — y < xY 



These inequalities are equivalent to 


X < 


(1 Y b 

— -—, 


y < 


d Y b 
—^ 


X Y y > 


a Y b 


To interpret them geometrically through P draw a line QPR making 
<fiQO = 45°. From the mid-})oint of QR drop the perpendiculars 
FS, VW on OX, OY. Then the preceding inequalities limit the position 
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of the point x, y to the shaded area /SFTF, whose part TSU is contained 
in the rectangle OMPN, Variables x and y are independent and have 
uniform distribution. Hence, the density of probability of the pair 
Xy y is constant and the probability that the point Xy y is in the triangle 
TSU will be 

Area T SU . _ ^ 1 6. 

Area OMPN ab 2 a 


At the same time this is the probability for AXy XY, BY to form a 
triangle. 

Problem^. On a line AB two points A^i, X 2 are taken at random. 
What is the probability that AXj, XiXoy X 2 B can form a triangle? 


A ^ B 

Xj X2 

Fig. 6. 



Fig. 7. 


Solution. Variables AXi — X], AXo = x^ are independent and have 
uniform distribution of probability. The domain of all possible positions 
of the point Xi, X 2 is a scpiare with the side AB = 1. Positions of this 
point when AXiy X 1 X 2 , XzB form a triangle can be characterized as 
follows. First, if A"i precedes X 2 , we have 


X 2 ~ Xi < x\ -Y I — xt or 0*2 — :ri < ^ 
Xx < X 2 — X\ + I — X 2 or ^ 

, . I 

I — X 2 < X 2 — xi A- Xx or X 2 > 2 


which means that :ri, X2 belongs to the triangle OPNy the definition of 
which is evident if L, My V, P are mid-points of the sides of the square 
A BCD. Second, if Xi follows ^ 2 , we have 

I 

Xx - X2 < X2 < 2'y > 2 

and these inequalities define the area OLM. Since the distribution of 
Xiy X 2 is uniform, the required probability is 

Area OLM + Area ONP __ ill ___ I 

AseaTBCD "■ W 4* 

Problem 3. A chord is drawn at random in a given circle. What is 
the probability that it is greater than the side of the equilateral triangle 
inscribed in that circle? 
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Solution 1. The position of the chord drawn at random can be deter¬ 
mined by its distance from the center of the circle. This distance may 
vary between 0 and Ry the radius of the circle. The chord is greater 
than the side of the equilateral triangle inscribed in the circle if its dis¬ 
tance from the center is less than }4R- Hence, the required probability 



1 

2 


Solution 2. Through one end of the chord, draw a tangent AT. 
The angle <p varying from 0° to 180° determiru^s the position of the 
chord. If it is greater than the side of the inscribed equilat¬ 
eral triangle, the angle (p must lie between 60° and 120°. 
Hence the answer 



Fig. 8. 


7>2 


120 ° 


60° 


180° 


The fact that we obtain two different numbers for th(‘ same probability 
seems paradoxical, and the problem itself is known as ‘^Bertrand’s 
paradox.” However, going attentively over both solutions, we discover 
that we are really dealing with two different problems. In the first 
solution it was assumed that the distance of th(' (‘herd from the center 
has uniform distribution, while in the st'cond solution the distribution 
of the angle tp was taken as uniform. The second solution may be con¬ 
sidered reasonable if a thin bar or a ncedh' can rotate freely about ^4 
and if, being set in motion, it detc'rrnines the chord AB by its ultimate 
position. On the other hand, the first solution is acceptable if a circular 
disk is thrown upon a board ruled with jiarallel lines distant from one 
another by the diameter of the disk. The intersection of the disk with 
one of the lines determines a chord, and the probability that it is greater 
than the side of the inscribed (equilateral triangle can reasonably be 
assumed to be }^. 

A general remark applies to all problems of this kind. When a 
certain geometrical element, such as a point or a liiu', is supposed to be 
taken at random, it should be clearly indicated by what kind of 
mechanism this is to be done. Ouly then the hypothetically assumed 
distribution can be put to an experimental test and either confirmed 
(approximately) or rejected. 

Ijl. Buff on’s Needle Problem. A board is ruled with equidistant 
parallel lines, the width of the strip betwecai two consecutive lines being 
d. A needle so fine that it can be likened to a rectilinear segment of the 
length I < d is thrown on the boiird. What is the probability that the 
needle will intersect one of the lines (naturally not more than one)? 

Solution. This is the oldest problem dealing with geometrical 
probabilities. It was mentioned by Buffon, the celebrated French 



252 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XII 


naturalist of the eighteenth century, in the Proceedings of the Paris 
Academy of Sciences (1733) and later reproduced with its solution in 
Buffon^s book ^‘Essai d'arithm6tique morale,^’ published in 1777. 

Let us determine tlie position of the needle by the distance OP — x oi 
its middle point from the nearest line, and the acute angle ip between OP 
and the needle. Variables x and ip may be considered as independent. 
Furthermore, x and ip vary n^spectively between 0 and ^ 2 ^, and 0 and 
7 r/ 2 . As a hypothesis we assume the distribution of probability for 


JC 



Fig. 9. Fig. 10. 


X and ip as uniform. The domain of x^ ip is a rectangle OABC with 
OA = 7r/2, OC = d/2. Now, the needle intersects one of the lines if 

/ I 

^ < 2 ^ 


and then the point x, ip lies in the shaded area below the curve 

I 


Since the distribution of Xj ip is uniform, the required probability will be 


But 


_ Area OAD 
^ ~ Area OABC' 


Area OAD = 

Area OABC = ^ | 


COS (pdip — 2 


and consequently 



On pages 112-113 an account was given of experiments made by several 
authors in connection with Buffon^s problem. They all show good agree- 
ihent with the theory and indirectly confirm the hypothesis assumed in 
deriving the above expression for probability. 

Extension of Buffon^s Problem. A thin plate in the shape of a 
convex polygon, of dimensions so small that it cannot intersect two of 
the lines simultaneously, is thrown on a board ruled, as in Buffon^s needle 
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problem. What is the probability that the boundary of the plate will 
intersect one of the lines? 

Solution. Suppose that the polygonal boundary has five sides. 
Let these sides (and their lengths) be denoted by 


a, I3y 7 , 5, e. 

Each of them is shorter than the distance d between two consecutive 
lines. On account of convexity, a line (^an intersect either none or two 
(and only two) sides. Accordingly, combining sides in pairs, we can 
distinguish 10 mutually exclusive cases and denote their probabilities by 

(a/3), {ay), {ab), (ae), (^t), (/?5), {^e), {yb), (ye), (be). 

The required probability will be given by the sum 

p = (a/3) + ( 0 : 7 ) + (cx5) + (ae) + (fiy) + (jSb) + (fie) + (yb) + 

+ (re) + (Se). 


On the other hand, the side a (^an be intersected by a line in four mutually 
exclusive ways; nanu'ly, together with fi or 7 , or b, or e. Hence, if (a) is 
the probability of intersection 


and similarly 


whence 


But 



(a) = (afi) H- ( 0 ^ 7 ) + (ab) + (ae). 


(fi) = (fia) + (fiy) + (fib) + (fie) 
(7) = (70^) + ( 7 fi) + (7S) + (7^) 
(5) = (ba) 4- (bfi) + (by) + (be) 

(e) = (ea) + (efi) + (€ 7 ) + (e6), 


(a) + (/3) + ( 7 ) + (6) + (e) = 2p. 





(e) 


= ?i 

ird^ 


and consequently 

__ ^±_^_±7_+J +J = L 

^ Trd ird 

where P is the perimeter of the polygonal boundary. Evidently this 
result is perfectly general. Since it does not depend upon the number of 
sides, by passage to the limit, it can be extended to the case of a plate 
bounded by any convex curve. 

16. Second Solution of Buffon’s Problem. Barbier has given another 
extremely ingenious solution of Buffon^s problem and of its extension. 
Let f(l) be an unknown probability that the needle will intersect a line. 
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Imagine that the needle is divided into two parts V and V', Evidently a 
line intersects the needle if, and only if, it intersects either the first or 
the second part. Hence, by the theorem of total probabilities 

fil) =/(o +/(n, 

whence, as in Sec. 4, we conclude 


fil) = Cl 


where C is a constant independent of 1. The whole question is how to 
determine this constant. Barbier\s ingenious idea was to let this 
problem d(‘pend on tlie solution of another one: A polygonal line (convex 
or not) is thrown upon the })oard; what is the matlu'matical expectation 
of the number of points of intersection? The perimeter of the polygonal 
line can be subdivided into n rectilinear parts Ui, ^ 2 , . . . an all less than 
d. With these 7i parts we can associate n variables Xi, X 2 , . . . Xn, such 
that 

0*1 = 1 if one of the line's intersects a» 

O',; = 0 otherwise. 

The sum 


5 = Xi + X2 + * * • + 


evidently gives the total numl)er of the points of intersection. Hence 

Eis) = E{x,) + E{x,) + • • • + Eixn) 

and, if pi is the probability of intersection of ai with one (and only one) 
line, 

Eixd = Pi, 

But, according to the previous result, 

Pi = Cai. 

Hence, we have a perfectly general formula 

E(s) = C(ai + 0-2 + • * * + Un) = CP 

where P is the perimeter of the polygonal lino. The result holds for any 
curvilinear arc (closed or not) as can be seen by the method of limits. 
This formula applied to a circle with the diameter d gives 

C ird = 2 


since such a circle has always exactly two points of intersection with 
the lines of the system. Thus we find that 


C = 


2^ 

rd 
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and 



as obtained before. For a closed convex line of sufficiently small dimen¬ 
sions only two cases are possible: two intersections (probability p), or 
none (probability 1 — p), whence E(s) ~ 2p and 


2p 


W 

ird 


or 



in agreement with the result obtained in Sec. 15. 

yi, Laplace’s Problem. A board is covered with a set of congruent 
rectangles as shown in the figure, and a thin needle is 

thrown on the board. Supposing that the needle is shorter_ 

than the smaller sides of the rectangles, find the probability- 

that the needle will be entirely contained in one of the --- 

rectangles of the set. 

Solution. Let AB = AD — b he the sides of the rectangle which 
contains the middle point of the needle, the length of which is 


I (I < ay I < b). 

Taking AB and AD for coordinate axes, the position of the needle is 
^ determined by two coordinates x, y of its middle point 

and the angle <p formed by the needle with the x axis. 
j) c may consider x, y, ip as three independent variables 

_ ^ with uniform distribution of probability. The domain 

^ ^ filled up with all possible points x, yj <p is Si 

parallelepipedon 


0<x<a; 0<^<6; ^ 

and the distribution of probability throughout this domain is uniform. 
To characterize the domain of points representing positions of the 

/f M 



Fiq. 13. Fig. 14. 

middle point of the needle when it is located entirely within A BCD we 
consider the sections of that domain by planes (p = constant and their 





266 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XII 


projections on the plane xy. These projections are represented by 
the shaded areas in Figs. 13 and 14 corresponding to positive and negative 
(Py respectively. 

In Fig. 13 

<PAB = p; AP\\BF\\CR\\DG 

and AP BE = BF CR = DG == DH \l. 

Similarly, in the second figure 

<JAB = <p) AJ\\BQ\\CL\\DS 

and AJ =- AK BQ ^ CL = CM DS - 

The area of the rectangle PQRS corresponding to these two cases can be 
expressed as follows: 

Area PQRS = {a — I cos <p)(b — I sin <p) = ab — l{b cos p + a sin (p) + 

+ F sin (p cos <pj 

Area PQRS = (a — I cos (p)(b + I sin (p) = ah -- l{b cos (p — a sin p) — 

— F sin p cos p. 

Without distinguishing positive and negative values of p^ we may write 

F{p) = area PQRS — ab — bl cos p — ?a|sin p\ + ^Z'^jsin 2p\. 

The volume of the domain representing positions of the needle entirely 
within ABCD is: 


while 


V = pFiip)d<p = 7ro6 - 2bl -2al + F 
~2 


V = wab 


is the volume of the domain 

0<a:<a, 0 < y < b, 

Hence, the required probability is: 

, 2lia + b) -F 

-- 

and the complementary probability for the needle to intersect the 
boundary of one of the rectangles is: 
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Buffon's problem may be considered as a limiting case when a = » 
and, indeed, by setting a — oo, we find that 


Q 


Tb 


in conformity with the result in Sec. 14. 

These examples may suffice to give an idea of problems in geometric 
probabilities. Sylvester, Crofton, and others have enriched this field 
by extremely ingenious methods of evaluating, or rather of avoiding 
evaluations, of very complicated multiple integrals. However, from the 
standpoint of principles, these investigations, ingenious as they are, 
do not contribute much to the general theory of probability. 


o 

Fig. 15. 


A ns 




Problems for Solution 

1. A point X is taken at random on a rectilinear segment AB = I whose middle 

point is O. What is the probability that AXy BX, and AO can form a triangle? The 
distribution of AX — a: is assumed to be uniform. Ans. 

2 . Two points Xi, X 2 are taken at random on AB = 1. ^ 

Assuming uniform distribution of probability, what is the mathe- A -- B 

matical expectation of any power n of the distance between Xi 
and Xil 

dxidxj __ 

’’ +2)’ 

3. Three points Xi, A'' 2 , X 3 are taken at random on AB, What is the probability 
that X 3 lies between A"i and A' 2 ? 

Ans. assuming uniform distribution of probability. 

4 . A rectilinear segment AB is divided into four equal parts 

AC = (70 = 0Z> = DB. 

Supposing that the distribution of probability is symmetric with respect to 0, let P 
be the probability that a point selected at random on A5 will be between C and D. 

Also, let Q be the probability that the middle point between 
two points selected at random will be between C and D. Prove 

1 A-P^ 
that Q > —-- 


CODE 
Fig. 16. 


Hint: The middle point of a segment X 1 X 2 is surely between C and D if : (i) Xi 
and X 2 are in CO; or (ii) Xi and A '2 are in OO; or (iii) Xi and X 2 are on opposite sides 
of O. 

bj Two points Xi, A "2 are chosen at random in a circle of radius r. Asvsuming 
uniform distribution of probability, what is the mathematical expectation of their 
distance? Ans. Denoting the required mathematical expectation by Af, we have 




where 

F{r, By e') 
Hence, varying r by dr 
dF = 


= e')d0dB' 

+p'* -2pp' cos (e - e')pp'dpdp'. 


2rdr 




-f- p* — 2rp cos — $^)pdp 
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and 


diir^r^M) — 47rrdr f C \/r^ 4- p® — 2rp cos oypdpdia. 

Jo Jo 


By introduction of now polar coordinates the integral in the right member can be 
exhibited as 



Fig. 17. 


Thus 


whence 


(*2 r2rcoBa> 16 r 2 32 

I do) I u^du — —r® I cos* o>do) = —r*. 

J-J Jo 3 Jo 


M = 


d(7rrW) = ^l^r^dr 

128r 

457r‘ 


6. A board is covered with congruent rectangles as in Laplace^s problem. A coin 
the diameter of which is less than the smaller side of the rectangles is thrown on the 
board. What is the probability that it will be partly in one rectangle and partly in 
another? Ans. a, 6, r being respectively the sides of the rectangles and radius of the 
coin, the required probability is 


2r(a + b — 2r) 
ah 


* 7. Solve Buffon’s problem when the needle is longer than the distance between 
two consecutive lines. Ans. The probability for the needle to intersect at least one 
line is 


V = 


—-(1 - sin v>o) + 

TTO 


2v?o 


where v>o is determined by cos v’o = d/l. 

8 . A board is covered with congruent triangles whose sides are o, h, c. A needle 
whose length is less than the shortest altitude of any one of these triangles is thrown 
on the board. What is the probability that the needle will be contained entirely 
within one of the triangles? Ans. The required probability is 

(Aa* + Bh^ + Cc^) P _ (4a + 4b + 4c - Zl)l 
^ ' 2wQ~ 


where A, C are angles opposite to sides a, 5, c and Q is double the area of the triangle. 
For equilateral triangles 



9. On each of the circles Oi, O 2 , Os, . . . with respective radii ri, rt, . . . 

points Afi, Mi, Mi, . . . are taken at random. Supposing that the series 

Ti A- Ti A- Ti ' ■ • 

is divergent, while the series 

+ rj-f r* + • • • 
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is convergent, prove that the probability that the length of the vector 


OM - OiMl + OsMa + O 3 M 3 + . . . + OnMn 

will be > jK tends to 0 as /2 —► 00 no matter how large n is. 

Indication of Solution. Let Xj, X 2 , . . . Xn] //i, 2 / 2 , . - . 2/n be components of 
OM\j OM’iy . . . OAfn on two rectangular axes OX, OF. Then ^ 

E(x]) = E(yl) = -*■ ^ 


By Tshebysheff’s lemma (Chap. X, Sec. 1) the probabilities Q and Q' of the inequalities 


I 2 /I + 2/2 "f • * • + 2/n| > ^ 




are both less than 1 /t^. Now, if the length OM > R then either 


\xi -f a;2 -f 


+ x„| > ^ - '^2 


\Vi+y2 + ■ ■ ■ +yn\>^ = tyj-- 

Hence, the probability P for the length of OM to be > J? is less than Q -f Q'; 
that is, 

P<Q + Q' 


10 . Prove that 


- /■ 


- dxidx2 • • ' dxn = 

+ ^2 -f • • • 4- 3 


Hint: Considering xi, Xj, . . . x» as continuous stochastic variables with uniform 
distribution over the interval (0, 1) prove with the help of Tshebysheff's inequality 
that the probability of 

? _ , < + - + xl ^ 2 ^ ^ 

3 Xl Xi Xn 3 


for any e > 0 tends to 1 as n —♦ <». 
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CHAPTER XIII 


THE GENERAL CONCEPT OF DISTRIBUTION 

!• In dealing with continuous stochastic variables we have introduced 
the important concept of the function of distribution. Denoting the 
density of probability by/( 2 ), this function was defined by 

m = ji^)dz 

and it represents the probability of the inequality 

X < t. 

For a variable with a finite number of values the function of distribu¬ 
tion can be defined as the sum 

Fit) ^ X Pi 

T. <t 

where pi, p 2 , • . . Pn are respective probabilities of all possible values 
Xi, X2, . . . Xn of the variable x. The notation Xi < t is intended to 
show that the summation is extended over all values of x less than t. 
Again, F(t) for any real t represents the probability of the inequality 

X < t. 

In this case F{t) is a discontinuous function, never decreasing and varying 
between jP( — <») = 0 and F(+oo) = 1. Its discontinuities are located 
at the points Xi, X 2 , . . . x„ and are such that 

Fix, + 0) - Fix, - 0) = Pi, 
denoting, in the customary way, 

Fix, + 0) = lim Fix, + e) 

Fix, — 0) = lim Fix, — e) 

when €, through positive values, converges to 0. To represent Fit) 
graphically we note that 



Fit) = 0 

for 

t < Xi 


Fit) = p: 

for 

Xi< t < X 2 

Fit) 

= Pi + Pi 

for 

Xi < t < Xg 

Fit) = Pi + pj + 

• • • + Pn 

for 

Xn < t. 
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As for the value of F{t) at the point t = Xt, it is F{xi — 0). Hence, 
the graph of F{t) consists of rectilinear segments as shown in the figure 
(for n = 4; xi = —2; = 0; Xz = 1; X4 = 3; = P 2 = Ps = p4 = M) 

and belongs to the so-called step lines. 

Thus, in case of a continuous variable the distribution function is 
given by an integral, and in case of a discontinuous variable, by a sum. 
In stating theorems equally true for continuous and discontinuous 
variables, it would be tedious always to distinguish these two cases. 
The question naturally arises whether it is possible to represent distribu¬ 
tion functions, moments, and similar quantities by using new symbols 
equally applicable to continuous and discontinuous variables. In a 
similar kind of investigation Stieltjes was confronted with the same 


-CX5 -2 0 • 3 

Fig. 19. 

difficulties and he succewlB in overcoming them by introducing a new 
kind of integrals known as “Stieltjes^ integrals.^^ 

Stieltjes^ Integrals 

2. Let (p{x) be a never decreasing function defined in the interval 
a S X ^ h. For any particular value of the argument both the limits 
(for € converging to 0 through positive values) 

lim ip{xo 4- €) = 4* 0) 

lim <p{xo — e) = <^(xo — 0) 

exist. Since evidently 

<^(xo — 0) ^ <p{Xq) g <^(Xo 4“ 0), 

Xo is a point of continuity of <p(x) if 

<p(xo — 0) = <p(xo 4" 0). 


If, however, 


(p(xo — 0 ) < (p(xo 4 - 0 ) 


<p(x) is discontinuous at Xo, and the difference 


mo = ip{xo + 0 ) — (p{xo — 0 ) 

gives the measure of discontinuity or simply discontinuity. Since 
for any number of points of discontinuity xo, Xi, . . . Xn the sum of 
discontinuities 

mo 4- 4“ • • • 4- mn ^ ip{h) — ^(a) 
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the points of discontinuity form a countable set. For there are only a 
finite number of discontinuities above any given number, so that, con¬ 
sidering the sequence 

6 > 5i > 52 > • • • 

tending to 0, there is only a finite number of points with discontinuities 
>8; also a finite number of points with discontinuities ^8 and >5i, 
and so on. It follows that points of discontinuity can be arranged into 
a single sequence and hence form a countable set. 

It may happen, however, that <p(x) may have discontinuities in any 
interval, no matter how small; but at any rate there are points of con¬ 
tinuity in any interval. If (p(xo + e) > <^(xo — e) for all sufficiently small 
€ > 0 the point 0*0 is called a point of increase” of (pix). In particular, 
any point of discontinuity is a point of increase. 

3. Let f(x) be a continuous function in the interval a ^ x ^ h. By 
inserting points Xi < X 2 < . . . < Xn this interval is subdivided into 
n + 1 partial intervals. In each of these we arbitrarily select points 
^ 0 ,^ 1 , . . - and form the sum 

S = f(^o)l<p(xi) — <p{a)] +/(fi)[<^(:r 2 ) — + • • • + 

+ /(in)[^(6) - <p{X„)]. 

It can be proved in the same way as for ordinary integrals that when 
all intervals 


X] — X2 — X\y . . . h — Xn 

tend to zero uniformly, the sum S tends to a definite limit. This limit, 
called Stieltjes’ integral, does not depend upon the manner of subdividing 
the interval (a, h) or upon the choice of points fo, fi, . . . It has 
a perfectly definite value as soon as f{x) and <^(x) (together with a, b) 
are given, and accordingly is denoted by 

In case (p(x) has a continuous derivative, d(p(x) can be interpreted 
as the ordinary differential; Stieltjes^ integral then coincides with the 
ordinary one. In other cases dip(x) is a new symbol introduced as a 
reminder of the origin of Stieltjes' integral. In particular, if <p(x) is a 
step function with discontinuities pi, p^, ps, . • . at the points Xu 
X 2 y x^y . . , y Stieltjes^ integral coincides with the sum 

2p</(x<) 

which is a finite sum or an absolutely convergent infinite series according 
as the set of points of discontinuity is finite or infinite. 
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Stieltjes^ integrals possess many properties of ordinary integrals. 
For instance, the mean-value theorem holds for them in the form; 

fy(x)d<p(x) = m)Mb) - <p(a)] 

where a ^ ^ ^ h. Also, if f(x) has a continuous derivative, we have an 
analogue for the integration by parts 

£f{x)d^{x) =f{bMb) -fiaMa) - £,p{x)df(x) 

where df(x) means an ordinary differential and the integral in the right 
member is an ordinary integral. Ho weaver, some important properties 
of ordinary integrals do not hold universally for Stieltjes^ integrals. For 
instance, considered as fun cations of h or a, they may have discontinuities. 

In the definition of Sticltjf^s’ integral it was assumed that a and b 
were finite numbers. Stieltjes^ integral over the interval — oo, oo ig 
defined in an ordinary way as being the limit of 

jy(x)d<p(x) 

when a and b tend independently to — qo and +oo, respectively. In 
other words, 

J * f{x)d(p(x) — lim jy{x)d(p(x) when a-^—oo, >+<», 

provided this limit exists. If it does not exist, the symbol 

fyf{x)d<p{x) 

has no meaning. 

The General Concept of Distribution 
4. The most general type of distribution function of probability, 
covering all imaginable cases, is given by a never decreasing function 
F{t) defined for all real values of t and varying from F(—oo) =0 to 
F(-f-oo) = 1. If at points of discontinuity we set 

F{t) = F{t - 0), 

then for any t the probability of the inequality 

X < t 

will be given by F(t), Also, the probability of the inequalities 

ti ^ X < t2 


will be 


F(t2) ~ F{t0. 
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The case of continuous F(t), having a continuous derivative f(t) 
(save for a finite set of points of discontinuity), corresponds to a con¬ 
tinuous variable distributed with the density/(O, since 

F{t) = Fj(x)dx. 

If F{t) is a step function with a finite number of discontinuities, it charac¬ 
terizes the distribution of probability of a variable with a finite number 
of values. Finally, if F(t) is a step function with an infinite set of dis¬ 
continuities distributed without density, it corresponds to a variable 
whose values can be arranged in a sequence according to their magnitude. 
These are the most important types of variables considered in the 
calculus of probability, and for all of them the distribution function can 
be represented by Stieltjes’ integral 


F(t) = JF(x). 

The mathematical expectation of any continuous function f{t) is 
defined by Stieltjes’ integral 


E(m) = f“j{t)dF{i) 


provided it has a meaning. In particular, moments of the order n {n 
positive integer) and absolute moments of the order a {a real) are defined, 
respectively, by 

m„ = t^dF{t) 

and we always have 


Finally, 


|Wn| g Mn. 


<p{t) = 


is the characteristic function of distribution. Since the integral exists 
for any real t, this function is defined for all real values t and satisfies the 
inequality 

\<p(t)\ ^ 1. 


Inequalities for Moments 

6 . Moments of any distribution satisfy certain inequalities, which 
it is important to know. They all are particular cases of the following 
very general inequality due to Liapounoff. 
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Liapounoff’s Inequadity. Let a, 6, c be three real numbers satisf 3 dng 
the inequalities 

a ^ b ^ c ^ 0 

and Ma; Mb, i^c absolute moments of orders a, 5, c for an arbitrary distribu¬ 
tion. Then the following inequality holds: 

Proof, a. Let pi, Piy . . . Pn] Xi, X‘z, . . . Xn be positive numbers 
and 


<p{a) = pixf^ + p2X?i + 


+ PnX^. 


Then for arbitrary real numbers Si, 52,... Sp the following inequality 
holds: 


( 1 ) 


-^ ^{si)ip{s2) • • • (p{sp). 


For p = 2 this inequality follows immediately from the known inequality 
due to Cauchy: 


( n \ 2 n n 

Xc.ij s ■ xm 


by taking in it 

fil _ ^2 

at = , hi = VpiX?. 

For p = 4 we have 

+ ■" I »■ + S s 


and continuing in the same manner we find in general that 




^(S2«). 


Let m be taken so that 2^ > p and let us take in the last inequality 

Si + 52 + • • * + Sp 


Sp_|_i — 5p-|-2 


— 5 2"» — S 


P 


Since 


Si + 6'2 + • • * + _ pS + (2 ^ - p)s _ 

2m 2^ 


we shall have 




(p(sp)(p{sy^ ^ 
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whence 

^ ip{Si)ip{S2) • • • (p{Sp), 

which is inequality (1). 

6. Let a ^ 6 ^ c ^ 0 be integers. Taking p = a — c; Si = S 2 = 

• • • = Sa^h = c; =...== Sa-c = 0 .J we have 

5i + 52 + ‘ + gg- c _ (g — 6)c + (6 — e)a _ ^ 

a — c a — c 

and consequently, by virtue of (1), 

( n \o—c J n \a~b/ n 

If g = p/Sj h = q/Sj c = r/s are rational Jiumlxu's (g ^ ^ c ^ 0), 

1 

it suffices to take, in (2), p, g, r instead of a, b, c, replace x, by xj, and 
raise both members to the power 1/s to asc('rtain that (2) holds for 
rational g, h, c. Finally, the passage to the limit makes it clear that (2) 
holds for real g, 6, c, provided a ^ 6 ^ c ^ 0. 

c. Let the interval ^4 to be subdivided into partial intervals by 
inserting numbers ti < t 2 < * * * < tn between A and B and let 

po - F(tO - F(A), Pi - Fit2) - F(t,), , , . pn = F{B) - F{tn) 

Xo = |A|, Xi = I<i|, . . . X„ = l^n|. 

Then the three sums 



XP'^^’ Xp*^ 

0 0 0 

will tend to the respective limits 

f/wdF(t), f/mFit) 

when all differences A — tiy (2 — tiy ... B — tn tend to 0 uniformly. 
Hence, passing to the limit in (2), we get 

(/;|1|W(0)- S {fyi-dFm)--'.{f‘i,i-dFmy-‘-, 
and finally, letting ^4 tend to — 00 and J5 to + 00 , 

S (fSjmop 

or 
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as stated. 
Taking h = 


a + c 


, Liapounoff’s inequality becomes 


whence 


a —c ^ o 

M o+C ^ ^ 

2 


M a+r ^ /^cMa 
2 


for any two real positive numbers a and c. 
integers and we take c = 2ky a = 2Z, then 

t^k+l ~ F2kf^2l 


or 


since 




If k and I are two positive 


\'f^k+i\ ^ f^k+i and fX2k = At2i = W2n 

Another important inequality results if we take c = 0. Then, since 
Mo = 1, 

Mb ^ Ma 

or 

i 1 

/j? g 


if a > 6 > 0. This amounts to 


g if a>b 

h a 

which is equivalent to the statement that 

h>M Mx 
X 

is an increasing function of x for positive x. 


Composition of Distribution Functions 
6 . An important problem in the calculus of probability is to find the 
distribution function of the sum of several independent variables when 
distribution functions of these variables are known. It suffices to show 
how this problem can be solved for the sum of two independent variables. 

Let X and y be two independent variables with the corresponding 
distribution functions F{t) and G{t). To find the distribution function 
H{t) of their sum 

z ^ x + y 
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is the same as to find the probability of the inequality 

x-\-y <t 

for an arbitrary real number t. Here, for the sake of simplicity and in 
view of the applications we propose to consider later, wo shall assume that 
one, at least, of the variables x, y has continuous distribution with 
generally continuous density. 

At first, let both x and y have continuous distributions so that 
F{1) = J(x)dx; G{t) = p j{x)dx. 

The probability of the inequality 

X + y < t 

according to the general principles stated in Chap. XII is expressed by 
the double integral 

H{t) = J ff(x)g(y)dxdy 
extended over the domain 


X + y <i. 

Now, following ordinary rules, we can reduce this double integral to a 
repeated integral. To this end, for any fixed x we integrate g{y) between 
limits — 00 and t — x, thus obtaining 

f_~yy)dy = G{t - x). 

Then, after multiplying by /(x), we integrate the resulting expression 
between limits — oo and + oo for x. The final result will be 

H{t) = j"G{t - x)f{x)dx 

or, written as Stieltjes’ integral, 

H{t) = f’ G(t - x)dF{x). 

In the second place, let x be a discontinuous variable with different 
values Xi, X2, Xs, . . . and corresponding probabilities pi, p2, ps, • • . . 
For X = x, the inequality 


x + y <t 


is equivalent to 


y <t - Xi 
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and the probability of this inequality is G{t — x^. Since the probability 
of X = Xi is pt, the compound probability of the two events 


will be 


X = Xi 
X + y <t 


— Xi). 

The total probability H{t) of the in(‘quality 

X + y <t 


will be expressed by the sum 

Hit) = ^plGit - X,) 

extended over all possible values of x. But this sum can again be written 
as Stieltjes^ integral: 


( 1 ) 


//(<) = - x)dF{x). 


In both cases we obtain the same expression for Hit). Evidently 
Hit) can also be defined as the inath(‘inatieal expectation of Git — x)\ 

Hit) = E{Git - x)| 

taken with respect to the variable x. The important formula (1) is 
known as the formula for composition of distribution functions Fit) 
and Git). 

Example, I^et x and y be two normally distributed variables with means = 0 
and respective standard deviations tri and Instead of using (1), it is better to 
write H{t) as a double integral 


Hit) 


_1__ 

27r<r 10" 2 



c 


2aP 


Jd. 


extended over the domain 

X + y < t. 

To evaluate this integral, it is natural to introduca*, x y = z as a new variable and 
find constants C, D, a, fi so as to have identically 




whence one easily finds 


_1 

2(^r+”^2)’ 


D = 


1 


2<ry,i<rl 4 - <rl) 


2<r; 2<r^ 


L 

2(<rJ 4- crl 




[x 4- y)^ 4- 



and 
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The Jacobian of 


2 = x 4- t/, 

with respect to x, y being 

1 1 

^ Zl 

or I ar>i 


u 


02 O'! 

—X - y 

0\ 02 


^2 I ^2 
+ 0^2 

0\02 


Hit) can be presented as the double integral 


H(t) = 


!27r(<r| -|- O 2 , 


Jf 




with the domain of integration defined by a single inequality: 


Hence, 


or 


since 


z <L 


Hit) 


1 n _^ 

Hit) = _ I e 2(cri2+a22)^2^ 

V 27r(<rf + ol)J- « 





_ w2_ __ 

2(<nT+,;2)^^ ^ V 27 ^(<^f + <rl). 


The expression obtained for H{t) leads to a remarkable conclusion: 
The sum of two normally distributed variable\s with mc^aiis = 0 and 
standard deviations oi and <72 is also a normally distributed variable with 
the mean = 0 and the standard deviation a — vM~+ l^he means 

of X and y are Ui and a 2 , then evidently z will be normally distributed 
with the mean a = ai + 02 and the standard deviation a = y/^\ + <^ 2 * 
Repeated application of this result leads to the following important 
theorem: 

7/ Xi, X 2 , . . . Xn are normally distributed independent variables with 
means ai, a 2 , . . . Un and standard deviations cri, 0 - 2 , . . . <rn, then their sum 


Z Xl + X2 + • ' • + Xn 


is again normally distributed with the mean a — ai + a 2 + • • • + an 
and the standard deviation cr \/a\ + crl + * • * + 

Finally, any linear function 

u = CiXi + C2X2 + • • • + CnXn 

is normally distributed with the mean a = Ciai + ^202 + . . • + c»an 
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and the standard deviation <r = \/ c\a\ + c\o\ + • • • + c\d%. In 
particular, the arithmetic mean 

Xi + X2 + ' ' ‘ + Xn 

n 


of identical normally distributed variables with the mean a and the 
standard deviation cr is normally distributed about the mean a and with 
the standard deviation cr/\/n- Hence, the conclusion may be drawn 
that the probability P of the inequality 

\Xi X2 ‘ ‘ Xn I . 


is given by 



and rapidly approaches 1 as n increases. This is a more definite form 
of the law of large numbers applied to normally distributed (identical or 
equal) variables. 


Determination of Distribution When Its Characteristic Function 

Is Given 

7. One of the most important conclusions to be drawn from the 
preceding considerations is that the distribution function of probability 
is uniquely determined by the characteristic function. The known 
proofs of this fact are rather subtle, owing to the use of conditionally 
convergent integrals. However, such integrals can be avoided by resort¬ 
ing to an ingenious device due to Liapounoff. In the general case, the 
distribution function of a variable x has discontinuities. To avoid the 
bad effect of these discontinuities, Liapounoff introduces a continuous 
variable y that, with reasonable probability, can have values only in the 
vicinity of 0. It may be surmised, therefore, that the continuous 
distribution function of the sum x + y will approximately represent that 
of X and, by disposing of a parameter involved in the distribution function 
of 1 /, will tend to it as a limit. To make these explanations more definite, 
let 2 / be a normally distributed variable whose distribution function is 


G{t) = 


hlrj- 


e 


When h is small, the probabilities of any one of the inequalities 

2 / > €, y < -€ 
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will be extremely small and even will tend to 0 when h tends to 0. Hence, 
the distribution function H{t) of the sum x y \s likely to tend to 
F{t) as a limit when h tends to 0. 

To prove this in all rigor, we apply the composition formula (Sec. 6) 
to our case. We obtain the following expression for H{t ): 


1 00 

H{i) = — dF{x) e"mz 

/? V TT 00 J- 00 

or, in more convenient form 

H(t) = -LC’° dF(x) ( c-^du) 

V^J-OO 

and furthermore, integrating by parts, 


Hit) 




-(izLV 
r> \ h J 


F(x)dx. 


The integral in the right member (‘an be split into three parts 

1 + _i r "e~(Sr) + 


Now, for positive T 


+ r* 'e Fix)dx. 

h\/ tJ -® 


1 r * 1 

■yr: I e~^"du < 

VttJt ^ 


Making use of this inequality, we find that 


1 

v^Lf 


AVt., 

and similarly 


(^-0’ 


F{x)dx < 


1 r«^- 


(IziV If® 1 

\ h ) — —_ I e-^^du < ~e. 


V^, 


4 


e~^^du < -^e 


TtJ- 00 


so that 
Hit) 


hy/r 


1 -i! 
Fix)dx < A*, 


1 r ~rr 
— -- I g 

hV'irJo 


’^’Fit + u)du + —= f'e 
h\^Tjo 


e ^Tit — u)du + 6e a*; 

0 < 0 < 1 . 

Given an arbitrary (r > 0, the number e can be taken so small that 

0 ^ Fit “h w) — Fit + 0) <! (T 
0 ^ Fit -0) - Fit -u) < a 
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for 0 < ^^ < €, whence 


—r 

V^Jo 


e + u)du 


Vtt Jo 

l-J). f 

Jo 


f'e '^Fit - u)du - Cc-'^'du < <r 

V ^ Jo VTT Jo 


e~'^^du < a 


F{t + 0) + F {t - 0) 




On the other hand, 


e~^^du + 6' 


{ 2 a+ e *■); l^'l < 1- 


e 

1 r* 1 IT” 10" -- 

I = 2 - i 0 < 0 " < 1 , 


.so that finally 


F(t + 0) + F(t - 0)1 


< 2<r + 2e *•, 


and for all sufficiently small h (e being kept fixed) 
1 t7/a __ Fit + 0) + Fit - 0)1 


that is, 


lim Hit) 
»->o 


Fit + 0) + Fit - 0) 


or, if < is a point of continuity, 

lim Hit) = Fit). 

*-*o 

Now we must find another analytical representation for Hit). To 
this end we consider the difference 


Hit) - HiO) = dFix)j 


e-'^^du, 


and, to represent in a convenient way the inner integral, we make use 
of the known integral 


V^J- « 


== e~“*. 
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• • 00 t oc 

Multiplying both sides by du and integrating between and —^— 

we find 


t ~x 


e~^^du 






IV 


- dv 


and 


Hit) - 77(0) = 




dv. 


The next step is to reverse the order of integrations, an operation 
which can be easily justified in this case. The result will be: 


H(0 - 7^(0) = 


U'^^dFix) 


or 

I (•“ j _ „-ivt 

m-HiO)=lj_j *,(v)^-dv 

since 

^ 00 

<p(v) = I e‘^W(x). 

Now, taking the limit of H(t) for h converging to 0, we have at any point 
of continuity of F{t) 

(2) Fit) = C + ^ lim e ^ 

where the constant 

^ _F(+0) +F(~0) 

C 2 


is determined by the condition F{—<^) = 0. Thus, the distribution 
function is completely determined by (2) at all points of continuity when 
the characteristic function (p{v) is given. 


Example 1. Let us apply (2) to find the distribution corresponding to the 
characteristic function 

<p(v) — e 2 . 


Since in this case the integral whose limit we seek is uniformly convergent, with 
respect to \ we find simply 


Fif) = C + 
= C-b 


1 r* 

277 j-/ 

ill: 


2 i 


_ ^—ivt 

-^—<k 

tv 


—5- sin tv , 

^ - dv. 


V 
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On the other hand (Chap. VII, page 128), 


X 


po ^ rr- u* 

00 V (T Jo 


so that 


Fit) =* C 


1 ^0 nt _^ 

I e 2a ^du H-= I e 2<r*du. 

T\/27rJ_ao trV 27r^_ 00 


Taking < = ~ oo, the condition F{ — oo) =0 gives 

C 


and so finally 


=-^r 

<r\/ 2‘jrJ - 


0 ua 
€ 2a^dUf 


e 2(r^du. 


Naturally, we find a normal distribution with the standard deviation a (compare page 
270). 

Example 2. What is the distribution determined by the characteristic function 
<piv) - e""!"!, a > 0? 

As in the preceding example we fmd that 

L tv 


F(t) = C + , 


-dv == C + 


But 


whence 


Thus 


€ cos tvdv = 


if” 

27 rJ~ « V 

d r ” sintv^ f “ 

dtjo V Jo 

1 * sin tv a dx ^ 

IT Jo ^ V ^ ttJo 4- ttJ- cott' 

= c - - + - f' 

2 +x= 

3 C = 3^2 » 

= - 
-rj- oeO^ 


1 I sin ^ 

“ I - 

irjo V 


■dv. 


o2 4-t* 


da: 


+ X* 2 


Fit) 

and the condition F( — «) =0 gives C = finally 

dx 


Naturally we find the same distribution as that considered in Example 2, page 243. 
Sometimes it is called “Cauchy's distribution" with the parameter a. 

Composition of Characteristic Functions 
8 . Having n independent variables xi, X 2 , . . . Xn whose charac¬ 
teristic functions are ^i(0; ^aCOy • * • product 

<p{t) = fp\(t)(p2{t) • • • <Pn{t) 
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is the characteristic function of their sum 


S X\ “1“ “f" • * * 

In fact, the characteristic function of s is by definition 

ip{t) = 

Since Xi, X 2 , . . . Xn are independent variables, the expectation of the 
product 

. ^ix^t . , . ^ix^t 


is equal to the product of the expectations of the factors, whence 

ip{i) = * * • V?n( 0 - 

This simple theorem is of great importance since it determines the 
characteristic function of the sum of independent variables and indirectly 
its function of distribution. 

9. A few examples will illustrate the preceding remark. 

Example 1. Consiclern independent normally distributed variables ar 2 , . . . Xn 
with means = 0 and standard deviations ai, <r 2 , ... an. Their characteristic func¬ 
tions are 

— e 2 • A: = 1, 2, ... n 

and the characteristic function of their sum 


will be 


where 


S = + X2 + • • • + Xn 


_zl!l 

*p(t) = C 2 


a* = aj + -f 




Hence « is a normally distributed variable with the mean 0 and the standard deviation 


a - \/a5 + a^ + • • ‘ -f <rj 

as we found previously by a method involving a considerable amount of calculation. 

Example 2. Independent variables Xi, X 2 , . . . Xn have Cauchy’s distributions 
with parameters a\, a^-, . . . an. Since the characteristic function of Xk is 

the characteristic function of the sum 


will be 
where 


S = Xi X2 • • • •+■ Xn 
<p{t) = 


a = ai -|- a2 + • * * cin- 


Hence, s again has Cauchy's distribution with the parameter ai -f- 02 + * * * + Oi*. 
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Example 8. Let Xi, Xa, . - . Xn be independent variables with uniform distribu¬ 
tion of probability in the interval ( 0 , 1). The characteristic function of any one of 
them is 

'•i 


1 r. 

- I 

I Jo 


eitxdx = 


eiu _ 1 


Hence, the characteristic function of their sum s will be 

<pit) 

The distribution function of 8 is given by 
F{t) = C + 


_ ( - iV 

ill )' 

a by 

J - CO \ tlv / tv 


and, since the integral again is uniformly convergent, 


Fit) 


- c 4-— r * f - i vi - < 

2 TrJ— 00 \ ilv / iv 


— di>. 


The evaluation of this integral presents certain difficulties. To avoid them we 
notice that the integrand considered as a function of a 

complex variable v is holomorphic everywhere. Hence, - 

we can substitute for the rectilinear path of integration 
the path V as shown in Fig. 20. 


Pea/axis 


o 

Fig. 20. 


Now it is easy to show that integrating over the path f we have 

0 if p > 0 

P ^ 0 


m 




dz ~ (P 

-27rt«+i-, if 


The integral 


r ( 

Jr\ ilz / i. 


^dz 

iz 


being a linear combination of integrals of the type fig) with fir ^ 0 reduces to 0 . 
Similarly, 


or, in explicit form, 


0 


k^Z 


Referring to the above expression of Fit), we find that 


F(t) - C + ^,2 " *')"• 
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The constant C — 0 since F{t) and the sum in the right member both vanish for 
< = 0. The final expression of Fit) is, therefore: 


Fit) = 


1 

1 • 2 • 3 • 





n{n - l)(t 

-TT2-V/ “ V 


The series in the right member is continued as long as arguments remain positive. 
Such is the probability that the sum 


3^1 + ^2 + ' * • 4" 


of n independent variables, uniformly distributed throughout the interval (0, Z), will 
be less than t. The above expression is due to Laplace, who, however, obtained it in 
quite a different manner. 


Problems for Solution 

1. Prove directly the inequality 


Mfl+C S liable 
2 


for absolute moments. 

Hint: The quadratic form in A, ac 



(X|x|2 4- ti\x\^)^d<pix) 


is definite or semidefinite. Show that the equality sign cannot hold if <pix) has at 
least two points of increase a, /? such that a:^ is neither 0 nor ± 1 . 

2. Let xi, X 2 , . . . Xn be n variables. Denoting the absolute moment of the order 
ct for Xi by and by ojs the quotient 

,,(l) 4- .,(2) _1_ . . . _1_ ,,(«) 

<^6 -- s 

(/4‘> + 4) + . .. + 

prove that 

1 1 ^ 

if «' > « > 0 . 

Hint: Use Liapounoff’s inequality. 

3. A variable is distributed over the interval (0, -h ») with a decreasing density of 
probability. Show that in this case moments M 2 and M 4 satisfy the inequality 


and that in general 


if y > M > 0. 

Indication of the Proof. 


Ml ^ iM 4 (Gauss) 

1 1 

[(m 4- DMmF ^ [{p + DMpV 

Show first that the existence of the integral 


x‘fix)dx 

in case fix) is a positive and decreasing function implies the existence of the limit 
lim — 0: a —♦ 4- <». 
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Hence, deduce that 


X' 


xd<p{x) = 1 , 


X' 


x^'^^d<p(x) = (m + 


X' 


x*''^^d<p{x) = (i' + l)Mu 


where (p{x) — /(O) — f{x) and, finally, apply the inequality 

4 . Using the composition formula ( 1 ), page 269, prove Laplace’s formula on 
page 278 by mathematical induction. 

5 . Prove that the distribution function of probability for a variable whose charac¬ 
teristic function <p{t) is given can be determined by the formula 


Fit) 


C 


+ lim — 



<piv) _ 

1 + iv 


Hint: In carrying out Liapounoff’s idea, take an auxiliary variable with the dis¬ 
tribution 


Giy) 


Also make use of the integral 



1 r ** cr'^'dx 

^J-«rTx2 




Many definite integrals can be evaluated using the relation between characteristic 
and distribution functions, as the following example shows. 

6 . Let X be distributed over (— «, + *») with the density The character¬ 

istic function being in this case 


we find 


<p{t) 


1 

1 + 


whence 


Fit) 


= C + — r ^ - —dv = i r 

27rJ~ aotvil -h v^) 80 

_ I -- fljj — 

tJ — 00 1 +1^* 


an integral due to Laplace. 

7 . A variable is said to have Poisson’s distribution if it can have only integral 
values 0 , 1 , 2 , . . . and the probability of x = k is 





the quantity a is called “parameter of distribution. If n variables have Poisson^s 
distribution with parameters ai, a 2 , . . . an, show that their sum has also Poisson's 
distribution, the parameter of which is ai -f 02 + • • • 4 - ctn. 
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8 . Prove the following result: 


1 r “ /sin A”sin 1 j_ 

/ V “ ”2 2 - 4 6 • • 2n_ 


(t + n)”' - -{t -\- n - 2 )« + 


(t + n - 4)" 




the series being continued as long as arguments remain positive. 

Hint: Consider the sum of n uniformly distributed variables in the interval 
(— 1 , -hi) and express its distribution function in two different ways. 

9 . Establish the expression for the mathematical expectation of the absolute 
value of the sum of n uniformly distributed variables in the interval ( —+^ 2 )- 
Ans. 


E\xi -h X 2 + 


+ ^n\ = 


2 • 4 • 6 • • • (2n + 2) 


^ (n — 2 )” + 

7i(n — 1) 

+ - 


the series being continued as long as the arguments remain positive. 

Hint: Apply Laplace's formula on page 278, conveniently modified, to express the 
expectation of -h ^2 4 - • ' • -h and that of |a:i -f 0:2 -f • • • -h ^n\. 

10 . Show that under the same conditions as in Prob. 9 


n r “ /sin A” \sin t — t 

' + *■+■■■ It; —F 

[olL 

X 


E\x 

Hint: Prove and use the following formula 

• T 


— t cos t. 


-dt. 


lim 

T mu 00 


-dx = — 7 r|w>|. 


11 . Let Xi and Xz be two identical and normally distributed variables with the 
mean = 0 and the standard deviation a. If x is defined as the greater of the values 
l^il; Mf l^hat is, 

X = max. (Ixil, 1 x 21 ) 
find the mean value of x as well as that of x*. Ans. 

E{x) = E{x^) 

v*- 

12 . Let 

X = min. (|xi|, (xzl, . . . |xn|) 

where Xi, X 2 , . . . Xn are identical normally distributed variables with the mean = 0 
and the standard deviation or. Find the mean value of x. Ans. Setting for brevity 



we have 


^X' 

<r'v 


^'du 


m. 


E(X) =^“{1 - 
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In particular for n = 2 


For large n asymptotically 


E{x) = ~(V2 - 1). 
Vir 


E{x) 


a\/ 7r/2 
n TF* 


13. A variable with the mean == 0 and the standard deviation = 1 is called a 
“reduced varialile." By changing the origin and the unit of measurement any 
variable can be made reduced. For, if x has the mean a and the standard deviation a 
the variable 


u 


X — a 


a 


is reduced. The distribution function of the reduced variable u can be called the 
“reduced law of distribution.’’ 

As we have seen, variables Xi and x-i with normal distribution have the same 
reduced law of distribution, as does their sum. The question may be raised: Is the 
normal law of distribution a unique law possessing this property? (G. P61ya.) 

Solution. Let Xi, Xi be two variables for which the second moment of the distri¬ 
bution exists, so that we can speak of their means and standard deviations. Let Xi 
have its mean ai and its standard deviation ar, likewise, let and 0-2 be the mean and 
the standard deviation of X 2 . Three reduced variables 


Xi — (li Xi — Q>2 X\ "f" Xi — CL\ Ui 

Ui =-> Ui — -» Ws = -—==:- 

^2 \/<^l -f 

have by hypothesis the same law of distribution. Hence, they have the same charac¬ 
teristic function (p(t) whence wo can draw the conclusion that the chara(;teristic 
functions of Xi, X 2 , Xi + .r 2 are, respectively, 

- C*^^i<p{crit); ipz{t) == 

Since 


we must have for an arbitrary real t 

or 

(1) <p{oLt)<pm) = <p{t) 
where 

« ^ a* 4-^2 = 1. 

A/o-J + <^2 

Since (1) holds for every real t, we shall have 

ip{oit) = ip{aH)<f{oL^t)\ *pipi) — 

and 

( 2 ) <pit) = tp{aH)(p{a^iy<p{^H). 

Applying (1) again to each of these factors in the right member of (2), we find that 

( 3 ) ^(0 - ^{a^t)<p{ccW<p(am^<pm) 
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and proceeding in the same way, we arrive at the general formula 
(4) <p{t) — • • • (p{^t)^n 

where po, Pi» • • • P« are coefficients in the expansion 
(1 + 2)*^ = Po -f PlZ + • • • + 

The arguments 

Vo = oi^tj V\ = . . . Vn = 

tend uniformly to 0 since a < 1, p < 1. The quotient 

»>(») - 


)-i T” r 

- = — I <W(<) I (1 — x)e"’‘^dx 

J-» Jo 

convergent intcj 

<p(v ) -1 ^ r 

" ‘ U- 


is represented by a uniformly convergent integral; hence 


PdF(i) = ~- 


where 


At the same time 


where again 


(p{v) =!+[-§+ c(v)lv2 
€(v) —> 0 as v —> 0. 

log <p{v) = [ —J 4- 5 (v)]f ;2 (principal branch of log) 


5(v) —♦ 0 as V 0. 

Now, taking logarithms of both members of (4) 

log <p{t) = -|- ft = -|_ ft 

where 

12 = <l4po5(vo)a^” + pi6(vi)a*"“*d^ 4- 
Given e > 0, we can take n so large that 

|5(vi)[ < e; i == 0, 1, . 

|Q| < 


4- Pn«(l>n)/3*"]. 


whence 

Thus 


|log ip{l) 4- W\ < 

and since € can be taken arbitrarily small, 

log ip{t) 4- = 0 

or 

= e-i‘\ 

which shows that the normal law is the only one with the required properties, among 
all laws with finite second moments. 
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CHAPTER XIV 


FUNDAMENTAL LIMIT THEOREMS 

1 . Bernoulli\s theorem, as we have seen in Chap. VII, follows from a 
more general one known as Laplace\s limit theorem. In terms already 
familiar to us, this theorem can be stated as follows: Let an event E 
occur m times in a series of n independent trials with constant probability 
p. As n becomes infinite, the distribution function of the quotient 

m — np 
npq 

approaches 

—]= f 

V^J-OO 

as a limit; or, to state it in a less precise form, the distribution of the 
above quotient tends to normal. 

Just as Bernoulli\s theorem itself is a very particular case of the general 
law of large numbers, so Laplace\s limit theorem is a special case of 
another extremely general theorem, the discovery of which by Laplace 
may be considered as the crowning achievement of his persistent efforts, 
extending over a period of more tlian twenty years, to find the approxi¬ 
mate distribution of probability for sums consisting of a great many 
independent components with almost arbitrary distributions. The 
result at which Laplace finally arrived is as astonishing as it is simple: 
if Xij X 2 , . . . Xn {E{Xi) = 0 , f = 1, 2, . . . n) are independent variables 
(subject to some very mild limitations not stated, however, by Laplace) 
and Bn is the dispersion of their sum, then for large n the distribution of 
the quotient 

Xi + X2+ * ' • + Xn 

VK 

is nearly normal. To put it more precisely, the distribution function 
of this quotient tends to the limit 

—\== f 

V^J- « 

as n becomes infinite. 

Laplace^s attempt to prove this important proposition does not stand 
the test of modern rigor and, besides, cannot easily be made rigorous. 

283 
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The same is true of the attempts made by later investigators, notably 
Poisson, Cauchy, and many others. Only after a lapse of many years 
were truly rigorous proofs of Laplace^s theorem given. This important 
achievement is the result of the work of three great Russian mathemati¬ 
cians: Tshebysheff (1887), Markoff (1898), and Liapounoff (190(>-1901). 
An account of Tshebysheff's and Markoffingenious investigations is 
given in Appendix II. Here we shall follow Liapounoff; for his method 
of proof has the advantage of simplicity even compared with more recent 
proofs, of which that given by J. W. Lindeberg deserves special mention.^ 

2. Before going into details of analysis, we shall state the limit theo¬ 
rem in a very general form due to Liapounoff. 

Laplace-Liapounoff’s Theorem. Let xi, . . . Xn he independent 
variables with their means = 0, possessing absolute moments of the order 
2 + 5 (where 8 is some number > 0): 

,,( 1 ) ,,( 2 ) .An) 

Ify denoting by Bn the dispersion of the sum + X 2 + • • * + the 
quotient 

.. _ 4- . . . -f 

Wn-i- 

Bn '^'2 

tends to 0 as n—> the probability of the inequality 

Xi + X2 Xn ^ ^ 

VK 

tends uniformly to the limit 

-4= r e-i-du. 

It is natural that the complete proof of a theorem of such character 
cannot be too short, and to make the proof clearer it is advisable to 
divide it into logically separated parts. 

3. The Fundamental Lemma. Let Sn be a variable, depending on an 
integer n, with the mean = 0 and the standard deviation =1. If its 
characteristic function 

(pn(v) = 

tends to 

e 2 

^ Lindeberg^s proof, as well as later proofs by P. Levy and others, make use of an 
ingenious artifice due to Liapounoff. Lindeberg explicitly acknowledges his indebted¬ 
ness to Liapounoff, while Levy and other French writers fail to give due credit to the 
great Russian mathematician. 
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uniformly in any given finite interval ( —Z, Z), then the distribution function 
Fn{t) of Sn tends uniformly {in the domain of all real values of t) to the limit 

—r e~^^^du. 

Proof, a. Together with the variable Sn, whose distribution function 
is Fnit), Liapounoff considers another variable 

Tn ^ Sn + y 

where y is a normally distributed variable with the distribution function 


1 


Denoting tlie distribution function of Tn by Hn{t)i we have (Chap. XIII, 
Sec. 7) 


^ r dFn{x) r ' e-^^d 

wttJ - « J - « 


On account of the inequality 


\/7 rJ T ^ 


we have: 


For t — X < 0: 


,, ( 9 ' -C^y 

e~^ du = -^e V * / ; 


; 0 < ^ 1 . 


For Z — a: ^ 0 


1 f ,, , 1 f" 1 

—I e-^ du = 1-I e~'^ du = 1- ~e V ^ ; 

VttJ-« \lTr Jt-x ^ 

h 


0 < 0" g 1. 


Hence, introducing these expressions into (1), 


hf p ("“*■) C-'l _ f „ ^ h) dF 


dF„{x) + ^ e V A / dl<\{x) 


e ^ ^ ' dFn{x) 


where again 0 < < 1; 0 < < 1. This leads to the following 

inequality: 

- FJtM -1 r dF„ix). 


|Hn«) - F„«)| < j ^ 

2V^J- » 
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and consequently 


|H„«) -/'nWI <^j_' 


e * e-''’*<p„(v)dv 


( 2 ) 


^ 4 V^{/_ 


^ ^ ^^l<Pn(v) — e '^\dv + 


r “ _iW_^ 'j 


Here we split the first integral into three Ji, Jo, Js, taken respectively 
between limits — <», —l\ — Z; Z, + oo and denote the second integral 


by J 4 . Since \iPn{v) — e 2| g 2 , we shall have 


-^\j, +J 31 < 4= 

4v7r V^,. 


O hh)^ 

e ^ dv < 


-y/ TT hi 


because 


e~^^du < 


for positive x. Also 


(4) 




To estimate J 2 we shall denote by €n(Z) the maximum of \ipn(v) — e ^ | in 
the interval Then 


^ 1 r I ^ Zl€„(Z) 


-c ^ Jz; = 2€„(Z). 


Finally, taking into account* ( 2 ), (3), (4), and (5), we find 

(6) |H.(0-F.(,)|<i..(!)+^ + ^4jl- 

6 . Expression ( 1 ) of Hnif) can be transformed in a manner similar 
to that employed in Chap. XIII, Sec. 7, if we first write 




e~‘^^du. 
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Thus we get 


1 1 r* ** 


-^'1 - e-‘VnW. 


1 *1 f* to • j t 

1 I — 2 —^sm (v, , 1 

~s + ii‘ + s. 


-~2~-ttv pa 


-(e 2 — ip^{v))dv. 


>0 • m 

-5-sin tv. 
e 2- dv 

V 


t)2 . . 

■~T--;rSin tv 


2 r ^ 

-I ve ^dv = 

TjO 


0 < 1 - c 4 < 


and consequently 


(7) -I-If < ? + ^ f " 

2 TT Jo V 47r 27r J _ « \v\ 

To find an upper bound of the integral in the right member, we split 
it into five integrals /i, / 2 , /s, I Ay /s taken respectively between limits 
— 00 , —Z; —Z, —X; —X, X; X, Z; Z, + 00 . To estimate /s, we notice 
that 


-1| g 


|e 


xW„(x) = 


Hence 


I'PnW - e 2| g y2. 


ii/.i s <d, - 


To estimate h + 1 4 , we use the inequality l«!>„(w) — e g €„(Z) and we 
get 


^ -f 


r ^ «n(0 
^ Jx y/whX 


Finally, dealing with 1 1 and / 6, we use the obvious inequality 


\(Pn{v) - e 2| ^ 2 
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and we obtain 




s-f 

TTjl 


~dv / 4e ^ 


Taking into account (7), (8), (9), and (10), the following inequality 
results: 

2 ttJo 4x 2x v^Tr/iX ^ (^0 

In it, since X is still at our disposal, we can take 

X = enil)^h~K 

The inequality thus obtained when combined with (6) gives (a — hi) 


(11) Fnit) 


2 ttJo 


Ae * 2 e 


Vsi'^ 


+ — + 


( 2 I + v^) 


(Za-i)ie„a)5 + 


Here a and I are arbitrary positive numbers. We dispose of them in 
the following manner: Given an arbitrary positive number e, we take a 
so large as to have 

<*2 

1 

X a 3^ 

and after that we select I large eniough to make 

« _ 4 _ ^ 1 

Vi] 4xr^ 3 "* 

Finally, since for a fixed Z, €„(/) by hyoothesis, tends to 0 when n —> 00 , 
there exists a number no such that 


(s + + r-® < I 

for all n > no. The inequality (11) then shows that 
„ .. 1 1 f “ sin tv, 

Fn(t) - H - - e 2 -~—dv < 

Z xjo ^ 


Tz sin tv, . 

2-cZv < c 


for n > no and this means that 


r ET /A 1 L 1 r “2 if “2“*^ 

lim F„(Z) = o + - I e 2 - d'o = —~ I e ^ du 

2 xJo V 
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uniformly in i because the number no, as clearly follows from the pre¬ 
ceding analysis, depends upon c only and not upon t. 

Remark 1. Without changing anything in the proof, we can state 
the fundamental lemma in a slightly generalized form as follows: If tn 
tends to the limit tj the 'probability of the inequality 


tends to 


Sn ^ tji 


V^f- 


er^^'^du. 


Remark 2. The fundamental lemma, although not explicitly stated 
by Liapounoff, is implicitly contained in his proof. More general 
propositions of the same nature have been published by Polya and Ldvy. 
The very elegant result due to the latter can be stated as follows: If 
the characteristic function of the variable tends to the characteristic function 

0(0 = J* * eM^dF{x) 

of a fixed distribution uniformly in any finite intervalj then 

lim Fn{t) = F{t) 

at any point of continuity of F(t), 

The above proof, corres})oiiding to the particular case 


F{t) = 





can be used, almost without any changes, in proving the general proposi¬ 
tion of L<^vy. 

4. Proof of Liapounoff’s Theorem, a. If Liapounoff^s condition 


Ms+j + +•_••+ M2+5 n 

1+^ 

is satisfied for a certain 5 > 0, it will be satisfied for all smaller 5. 

Let fi(t) be the distribution function of Xi{i = 1, 2, . . . n). The 
sum 


m =/l(<) +Mt) + • • • +fnit) 

being a nondecreasing function of t, the following inequality holds 
(Chap. XIII, Sec. 5): 
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provided a > b > c > 0, We take here 

a = 2+5, 5 = 2 + 5', c = 2 

supposing 0 < 5' < 5. Then 


n n 


and 


in \h in \h' 


But this inequality is equivalent to 


and it shows that 


if 


6 ' 



n 



provided 0 < 5' < 5. Hence, in the proof we can assume that the funda¬ 
mental condition is satisfied for some positive 5^1. 

5. Liapounoff's inequality (Chap. XIII, Sec. 5) with c = 0, 5 = 2, 
a = 2 + 5 when applied to Xi gives 


Hence, 

( 12 ) 


ft?-*-* ^ 



2 

< + 5 


and, since it is assumed that ojn “■> 0, all the quotients 


^ _ 5 ^_ 

Bn 5i + 62 + * ■ ’ + 5n 
will converge to 0 uniformly as n —> «>. 


{i = 1 , 2 , ... n) 
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c. The following formula can easily be obtained by means of integra¬ 
tion by parts: 


1 + rx - ^ - 1)(1 - t)dt. 


If X is real and in absolute value >2, we have 


2 J* (gixi __ 


1)(1 - t)dt\ < 


\ei-t _ i[ g 2. 

If [x] ^ 2, we can use the inequality 


and find 


- 1| ^ 2^t S 2|L^ 


- 1)(1 - 0^^ 


Thus, for every real x 


r2 1^12+5 




Substituting here 


X = = t^k 

VK ^ 


and taking the mathematical expectation of both members, we have 
(13) Mt) = = 1 - |e*| g 1. 

Furthermore, since 


1 - X = - ^x2; X > 0; 0 < 0 < 1, 


we can write 

<«) - i-'=- i(s 

If < 1, we shall have, by virtue of (12), 

< 1 

On 
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and consequently 



This inequality, together with (13) arid (14), leads to the following 
expression of <pk{t)\ 

bk 

(15) <pk{t) = e (1 + (Tk) 

where 



d. The characteristic function of the variable 

+ X2 + ’ ' ' + Xn 


is 

• • • <Pn{t) 

because Xij 0 : 2 , .. . Xn are independent variables. Hence, by (15) 

(fit) = C“l^'(l + <Ti)(l + C^2) * * • (1 + 0-n) 

<(l + ki|)(l + N) • • • (l + k|)-l<eW + H+---+k.l~l 

and 

(17) Wit) - - 1 

taking into account inequalities (16). Inequality (17) holds if 

< 1 . 

Suppose, now, that t is confined to an arbitrary finite interval 

-I Si 

Because Wn, by hypothesis, tends to 0, the difference 

will tend to 0 as n —^ 00 . In connection with (17) this shows that 

(fit) —> 

uniformly in any finite interval. It suffices now to invoke the funda¬ 
mental lemma to complete the proof of Liapounoff^s theorem. 

5. Particular Cases. This theorem is extremely general and it is 
hardly possible to find cases of any practical importance to which it 
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could not be applied. Two particularly significant cases deserve special 
mention. 

First Case. Let us suppose that variables Xi,X 2 , • • • Xn are bounded, 
so that any possible value of any one of them is absolutely less than a 
constant C. Evidently 

and hence 

It suffi(^es to assume that 

= bl + f>2 + ■ ‘ ‘ +■ bn 


tends to infinity to lie sure that Wn —^ 0. Hence, dealing with bounded 
independent variabl(\s, the condition for the validity of the limit theorem 
is 

> 00 as n 00 , 


which is equivalent to the statement that the series 


bl + b2 + ba + • • • 


is divergent. 

Poisson’s series of trials affords a good illustration of this case. In 
the usual way, we attach to each of the trials a variable which assumes 
two values, 1 and 0, according as an event E occurs or fails in that trial. 
Let pi and </{ == 1 — pi be the respective probabilities of the occurrence 
and failure of E in the zth trial. The variable Zi attached to this trial 
is defined by 

Zi = 1 a E occurs, 

Zi 0 U E fails. 


Noticing that 


E{Zi) = Pi, 


we introduce new variables 


Xi = 2:t - Pi {i = 1, 2, . . . n) 
with the mean 0, whose sum is given by 


m — up 

where m is the number of occurrences of E in n trials and p the mean 
probability 

Pi + P2 + * • * + Pn 
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In our case 
and 


= Pi9i 

n 

Bn = 


Hence, we can formulate the following theorem: 
Theorem. The probability of the inequality 


m — np < t\/Bn 

tends uniformly to the limit 


as n 


1 



c 2du 


00 , provided the series 


'XviQi 

1 

is divergent. At the same time the probability of the inequalities 

ti\/B'n <m — np < hy/Bn 
tends uniformly {in t\^ h) to the limit 

—= I e 2du. 

V^j<. 

Second Case. Let 2 : 1 , 2 ^ 2 , .. . Zn be identical variables with the 
common mean a and dispersion b. Supposing that for some positive 6 

E\zi — a|2+« = c 

exists, we have 

nc c 

(nfc) 2 b +2 

and hence con 0 as n —> 00 . The limit theorem applied to this case 
can be stated as follows: 

The probability of the inequality 

+ 2^2 + ' ' * + —• Tin < ^\/n6 

tends uniformly to 

1 r 
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provided 


E\zi ~ a|2+« 


exists for some positive S. As a corollary we have: The probability of the 
inequalities 



This proposition is regarded as justification of the ordinary procedure 
of taking a mean of several observed measurements of the same quantity, 
made under the same conditions, to approximate its ‘Hrue value.” 
Barring systematical errors which should be eliminated by a careful 
study of the tools used for measurements, the true value of the unknown 
quantity is regarded as coinciding with the expectation of a set of poten¬ 
tially possible values each having a certain probability of materializing 
in actual measurement. Since for comparatively small t the above 
integral comes very near to 1 and 


‘j- 

\n 

for large n becomes as small as we please, the probability of the mean of a 
very large number of observations deviating very little from the true 
value of the quantity to be measured, will be close to 1 and herein lies 
the justification of the rule of mean mentioned above. 


Estimation of the Error Term 
6. The limit theorem is a proposition of an essentially asymptotic 
character. It states merely that the distribution function Fn{t) of the 
variable 


approaches the limit 


x ^ + X2 + • * • + Xn 

'V'b7 



e ^du 


as n becomes infinite when a certain condition is fulfilled. For practical 
purposes it is very important to estimate the error committed by replac¬ 
ing Fn(t) by its limit when n is a finite but very large number. In his 
original paper Liapounoff had this important problem in his mind and 
for that reason entered into more detailed elaboration of various parts 
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of his proof than was strictly necessary to establish an asymptotic 
theorem. 

We do not intend to reproduce here this part of Liapounoff^s investiga¬ 
tion; it suffices to indicate the final result. Assuming the existence of 
absolute moments of the third order E\xi\^\ z = 1, 2, . . . n, we shall 
suppose n so large that 

_ +yr + - - + ^ 1 

^ 20 

Then, setting 

F„(0 = r e-i-^du + R, 

V^TtJ - 00 

we shall have 

\R\ < |on[(log 3 ^)^ + 1.1 j + 0 ,= log 

Although this limit for the error term is probably too high, it seems 
to be the best available. However, it is greatly desirable to have a more 
genuine estimation of R, 

7. Hypothesis of Elementary Errors. It is considered as an experi¬ 
mental fact that accidental errors of observations (or measurements) 
follow closely the law of normal distribution. In the sphere of biology, 
similar phenomena have been observed as to the size of the bodies and 
various organs of living organisms. What can be suggested as an 
explanation of these observed facts? In regard to errors of observations, 
Laplace proposed a hypothesis which may sound plausible. He considers 
the total error as a sum of numerous very small elementary errors due 
to independent causes. 

It can hardly be doubted that various independent or nearly inde¬ 
pendent causes contribute to the total error. In astronomical observa¬ 
tions, for instance, shght changes in the temperature, irregular currents 
of air, vibrations of buildings, and even the state of the organs of percep¬ 
tion of an observer may be considered as but a small part of such causes. 
One can easily understand that the growth of the organs of living organ¬ 
isms is also dependent on many factors of accidental character which 
independently tend to increase or decrease the size of the organs. If, 
on the ground of such evidence, we accept LapIace^s hypothesis, we can 
try the explanation of the normal law of distribution on the basis of the 
general theorems established above. 

Suppose that elementary errors do not exceed in absolute value a 
certain number Z, very small compared with the standard deviation a 
of their sum. The quantity denoted by con in the preceding section will 
be less than the ratio I/a and hence will be a small number; and the same 
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will be true of the error term i2. Hence, the distribution of the total 
error will be nearly normal. 

Laplace^s explanation of the observed prevalence of normal distribu¬ 
tions may be accepted as plausible, at least. But the question may be 
raised whether elementary errors an^ small enough and numerous enough 
to make the difference between the true distribution function of the total 
error and that of a normal distribution small. Besides, Laplace’s 
hypothesis is based on the principle of superposition of small effects and 
thus introduces another assumption of an arbitrary character. 

Finally, the experimental data quoted in support of the normal dis¬ 
tribution of errors of observations and biological measurements are not 
numerous enough for one to place full confidence in them. Hence, the 
widely accepted statistical theories based on the normal law of distribu¬ 
tion cannot be fully relied on and may be considered merely as substitutes 
for more accurate knowledge which we do not yet possess in dealing with 
problems of vital importance in the sphere of human activities. 


Limit Theorems for Dependent Variables 

8 . The fundamental limit theorem can be extended to sums of depend¬ 
ent variables as, under special assumptions, was shown first by Markoff 
and later by S. Bernstein, whose work may be considered an outstanding 
recent contribution to the theory of probability. However, the condi¬ 
tions for the validity of the theorems established by Bernstein are rather 
complicated, and the whole subject seems to lack ultimate simplicity. 
For that reason we confine ourselves here to a few special cases. 

Example 1. Let iis consider a simple chain in which probabilities for an event E 
to occur in any trial are p' and p", respectively, according as E occurred or failed in 
the preceding trial. The probability for E to occur at the nth trial when the results of 
other trials are unknown is 


Pn = P + (Pl - ^ 

where ni is the initial probability, 6 = j/ — p" and 



The mean probability for n trials is given by 


Pn 


V + 



1 - y* 


80 that p may be considered as the mean probability in infinitely many trials. 

In the usual way, to trials 1, 2, 3, . . . we attach variables xi, X 2 , xs, . . . so that 
in general 


X, = 1 - Pi 


or 


Xi = — Pi 
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according as E occurs or fails in the tth trial. If m is the number of occurrences of 
E in n trials, the sum 

Xi + X2 Xn 


of dependent variables represents 


Evidently 


m — npn. 


E{m — npn) - 0 


and, as we have seen in Chap. XI, Sec. 7, 


Bn — E{m — nPn)^ 


1+6 


that is, the ratio of npq^ 


1 + 5 
1 - 6 


tends to 1 as n becomes infinite. 


In order to find an appropriate expression of the characteristic function of the 
quotient 


m ~ npn 


we shall endeavor first to find the generating function unii) for probabilities 

Pm.ni^l = 0, 1, 2, ... n) 

to have exactly m occurrences of J57 in n trials. Let Am,n be the probability of m 
occurrences when the whole series ends with E and similarly Bm,n the probability of 
m occurrences when this scries ends with F, the event opposite to E. The following 
relations follow immediately from the definition of a chain 


(18) 

Let 


Am,n+l Attt—l,nP 

Bm,n+1 ~ Am,nQ + Bm,nQ . 


m = 0 


4 'nii) — Bm.nt^ 

m-0 


be the generating function of and Bm,n. From relations (18) it follows that 

== p'iOn(t) + p"tM0 


These relations estabhshed for n ^ 1 will hold evcm for n = 0 if we define 0o(t) and 
^o(0 by 

+ p"^o — Pi 
q'dQ + g'Vo == 1 — Pi 

whence 


^0 + ^0 = 1. 


From (19) one can easily conclude that both 6«(/) and ypn{i) satisfy the same equa¬ 
tion in finite differences of the second order 


0n+2 — {p'i +5'0^n+l + UBn = 0 
'^•+2 - (p'« + g")^An+l + Wn « 0. 
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Evidently 

hence 


Pm.n 


Am,n ”1” 


^n(t) — 6n(f) ^n{t) 

satisfies the equation 

(20) tO„ + 2 — (p'^ + q")o3n^\ + ^to3n = 0 

and is completely determined by it and the initial conditions 


Since 


O>0 — Ij Wt — -f- J)\t. 


p' = P + q" == g -f p6 

the characteristic equation corresponding to (20) can be written 

(f - i)(r ~ d) = {t - !)[(/> 4- 96)r - 8 ] 


and for small < — 1 its roots can be expanded into power series 
f 1 = 1 + Cl(/. — 1) -j" Ci(t — 1)2 4- • • • 

^2 = 5 4" d] (/ — 1) 4" d‘2(t — 1)2 4" • • • . 


The general expression of oonit) will b(‘ 

a)n{t) — A B^2 — a 4 - 


where to satisfy the initial conditions we must take 


fa - n ' ” r2 - ri 

Having found a)„(0, the characteristic function of 

m — npn 

Sn ~ /- 

VBn 

will be given by 

To study the asymptotic behavior of <Pn{v) when v is confined to a finite fixed 
interval — Z ^ t; ^ we notice that then 

V 

will be well within the convergence region of the series we are going to consider now. 
By means of Lagrange’s series or otherwise, we find the following expansion of log f i in 
power series of Z — 1 

log fi = Pit - 1) - (f - 1 ~)(« - D* + 

convergent for sufficiently small values of Z - 1. By setting t == we obtain another 
power series in u 


pq 1 4- 5 
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convergent for sufficiently small u. Hence 

npux — npq- -\-nu^o{u) 

= e 

— npwt+npJ-^ ~ - nu^g{u) 

rr" = e ^ 

where g{u) is a bounded function of u being contained in a certain interval (—r, r). 
By substituting 


here, we easily conclude that 


tends uniformly to the limit 


in the interval — Z ^ p ^ Z while 

c - "Ti 

remains there uniformly bounded. Since, as can easily be seen, A and B can be 
represented by power series 

A = \ CLi'U . 

B ~ —a\u — aiv} — • • • 

A tends uniformly to 1 and B tends uniformly to 0. Hence, finally, ipniv) in any fixed 

interval tends uniformly to c It suffices to apply the fundamental 

lemma to conclude that the probability of the inequality 

'd'Pn 


V^n 


— nPn—j=u 

e Vii.f" 


l! 

“2 


- npn- 




tends uniformly to the limit 



u* 

^du 


if tn tends to t. 

Since Bn is asymptotic to - and pn differs from p by a quantity of the order 

1 — d 

1 /n, the inequality 


can be written in the form 


m — np < t 




npq 


m — nPn < in\/Bn 


with in tending to t, whence, using the above established result, the following theorem 
due to Markoff can be derived: 
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Theorem. For a simple chain the probability of the inequalities 




) 

:npq < m — np < 


i . IL±. 




tends to the limit 



^du 


as n —> Qc . 

Example 2. Considering an indefinite s(;ries of Bernoullian trials with the prob¬ 
ability p for an event A to occur, we can regard pairs of consecutive trials 1 and 2, 
2 and 3, 3 and 4, and so on, as forming a new series of trials which may produce an 
event E consisting of two successive occurrences oi A {E = A A) or an events opposite 
to E {F = AB, BA, BB). With respect to E the trials of the new series are no longer 
independent. Let m be the number of occurrences of E in n trials. Then 


and 


E{ni — np^) ~ 0 


Bn = E{tri — = np^q{l 3p) — 2p^q 


as was shown in Chap. XI, Sec. 6. 

Let Pm.n be the probability of exactly m occurrences of E in a series of n trials. 
Evidently 

Prn,n = 4' Pm.n 


where Am,n and Pm.n are the probabilities of rn occurrences of E when the Bernoullian 
series of n + 1 trials ends with A or P, respectively. By an easy application of the 
theorems of total and compound probabilities we get 

ylm.nfl = Am-\,nV + Prn.nP 
Pm.n-M = A„,,nq + Pni.n?- 

Corresponding to these relations the generating functions 


m =0 m=0 

satisfy the following equations in finite differences: 

^n+1 = qOn -j- (ppn 

holding (iven for w = 0 if we set Bo == p, = 9- Hence, it follows that Bn{t) and 
satisfy the same equations of the second order 

0n+2 — {pt 4- q)Bn+l 4- pq{t — l)0n = 0 
^n+2 - {pt 4- g)^n+l 4- Pq{t - l)^n =0 


wUt) = e»(«) + MO = X 

m “0 


and so does their sum 
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Thus, to determine wn(0 we have the equation 

OJn+a — (pt “h 5)c0n+l -f- PQit — l)wn =* 0 

and the initial conditions 


Wo = 1, wi = 1 — -f- pH. 


The general expression of wn(0 is 

o^nit) = Ari+ Br, = Ari + Bp^q-a - i)«rr 

where f i and f 2 are roots of the equation 

- r = PU - l)(r - q) 


and 


A 


-^2 + 1 + pKt - 1 ) . 
- ^2 


B 


1 - p^t - 1) 


If is the root which for t = 1 rtiduces to 1, we easily find the following series 


p^( —p^-\- 2pq) 

log ri = pKt - 1) + _ 1)2 4. . . . 


or, setting t = and supposing u sufficiently small, 

p2g(l + Sp) 


log = ip^u 


+ 


As to A and Bj they can be developed into series of the form 

A = 1 + cw2 + . . . 

B — —cu^ 


Hence, reasoning in the same manner as in Example 1, we can conclude that the 
characteristic function 


of the variable 


nph) . tv 

<Pn{v) = e V^nO)n{eV^) 
m — np^ 

vir 


tends to the limit e ^uniformly in any finite and fixed interval —I ^ v ^ 1. Refer¬ 
ring, finally, to the fundamental lemma, we reach the following conclusion: The 
probability of the inequalities 


ti\/np^q(l -f- Sp) < m — np* < t 2 \/np^q(l + Sp) 
tends uniformly (with respect to ti and h) to the limit 


-L_r 


m2 

^du 


as n ■<-♦ 00 . 
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Problems for Solution 

1. Consider a series of independent variables Xi, Xz, Xs, . . . where in general 
X* (fc = 1, 2, 3, . . . ) ean have only two values /c“ and —A:" each with the probability 
}/ 2 . Show that the limit theorem holds for the variables thus defined if a > —Hi 
but the law of large numbers holds only if « < H* 

Solution. Evidently 

E{xk) — 0, E(x^) = E\xkY = 

From Euler’s formula (Appendix I) we derive two asymptotic expressions 


Hence 


Br. = P« + 22« + 




2q: + 1 


P“ + 23« -f 




3q: + 1 


(2a + 1)3 


so that the limit theorem holds. For a = H the probability of the inequalities 


^ + X2 + • • • Xn 

— € < - < € 

n 

tends to the limit 



and the law of large numbers docs not hold. 

2. Let be the numlxT of su(a*esses in i Bernoullian trials with the probability p. 
Show that the limit theorem holds for variables 


Si = 


m,- — sp 

Vw 


i = 1, 2, 


n 


but the law of large numbers does not hold (Bernstein). 
Hint: 


8l “f S2 “b * 

• + So = (P9) * 






-f- 


"f" /—Xn 
V n 


where Xi, Xa, . . . Xn are independent variables with two values q and — p associated 
in the customary way with trials 1, 2, . . . n. 

3. Consider an infinite sequence of independent variables Xi, Xa, Xs, . . . where 
Xh can have three values 


0, (log A;)^ -(log kY 
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with the corresponding probabilities 


{k + a) [log {k + {k + a) {log {k + a)\P {k + ol) (log {k -f a)!^ 

a being a sufficiently large constant. Moreover, tx and p satisfy the inequality 

2p — p 4" 1 >0. 

Show (a) that Liapounoff’s condition is satisfied when p < 1 and hence the limit 
theorem holds; (6) that this condition is not satisfied if p ^ 1 and at the same time the 
limit theorem fails at least for p > 1. 

Solution, a. By using Euler's formula we find 


Wn 


1 + ' 

(2m + 1 - P) ^ 
(2 + 5)p -h 1 — p 


|log (n + a) 1 




Hence the first part is answered. 

6. The probability of the inequality 


is less than 


Xi + + • • • + Xn ^ 0 


^2 


_ 1 _ 

{k H- a) llog {k + 0 i)}p 


and this, in case p > 1, is less than 


-(log ay-p. 

p — I 

Hence, the probability of the equality 

X\ -p ^2 • * • + Xn = 0 


remains always > 1 


p — 1 

that Bn ^ because 2p — p + 1 > 0 
4 . Prove the asymptotic formula 


(log a)^ p and the limit theorem cannot hold. Note 


1 +n +-h 

1-2 


+ r.2 


n" 


n being a large integer. 

Hint: Apply Liapounoff’s theorem to n variables distributed according to Poisson^s 
law with parameter 1. 

6. By resorting to the fundamental lemma, prove the following theorem due to 
Markoff: If for a variable Sn with the mean = 0 and the standard deviation = 1 

lim = — I 

n-> 00 -Y/2 x J - 00 
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for any given A: = 3, 4, 5, 
to the limit 


, then the probability of the inequality Sn < t tends 



e 


6 . In many special cases the limit of the error term can be considerably lower than 
that given in Sec. 6. For instance, if variables xi, x^, . . . x« are identical and uni¬ 
formly distributed in the interval — } 2 , M the probability Fn{t) of the inequality 


differs from 


Xl X2 Xn < t 




u* 


t du 


by less (in absolute value) than 



12 

_l— 

irm 




the last two terms being completely negligible for somewhat large n. 
Indication of the Proof. First establish the inequalities 


sin <p 


< c 


6 


sin <p 

- -> e; 


6 136 


for 0 ^ <p ^ Tr/2. Further, represent F„(0 by the integral 


Fnit) 



and split it into two integrals taken between 0 and 7r\/nl\/l2 and 7r\/nl \/T2 and 
-h 00. 

7 . Supposing again that xi, x^, . . . arc identical and uniformly distributed in 
the interval — 3^^, 3 ^ 2 , pro /e that for n ^ 2 


E\xi -p X 2 Xn\ 



0 < 0 < 1 . 


8. Let Sn be a variable with the mean = 0 and standard deviation =1. If its 
characteristic function ipn{t) tends to as w —^ uniformly in any finite interval 

show that 
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9. If independent variables xi, X 2 , . . . Xn with means =0 satisfy Liapounoff's 
condition, prove that 

E\xi + X 2 + * • • + ^ 



10 . Show that for a simple chain of trials 

, l2npgl-\-8 

p being the mean probability in infinite series of trials and 5 — p' — p'\ 

11 . A series of dependent trials can be illustrated by the following urn scheme: 
Two urns, 1 and 2, contain white and black balls in such proportions that the prob¬ 
ability of drawing a white ball from 1 is p, whereas the probability of drawing a 
white ball from 2 is = 1 — p. Whenever a ball taken from an urn is white, the 
next ball is taken from the same urn, but if it is black, the next ball is drawn from the 
other urn. The urn at the first drawing is selected by lot, the probabilities of select¬ 
ing the first or the second urn being given. Evidently the course of trials is deter¬ 
mined by these rules without any ambiguity. Let m denote the number of white balls 
obtained in n drawings and let 


a = -f- q^. 

Show that the probability of the inequality 


m 


< t\/La{\ — a)n'y 


2(1 - Zpq) 
1 - 2pq 


approaches the limit 



Indication of the Proof. Let 


r>(l) . p(2) . p(3) . p'4} 

•* m.n; m ,n f -* m,n; m ,n 

be the proV)abilitie8 of having m white balls in n trials when (a) the last ball is white 
and from urn 1; (h) the last ball is white and from urn 2; (c) the last ball is black and 
from urn 1; and (d) the last ball is black and from urn 2. The sum 


Pm.n = PL'I + Pi^l +PL’\ + PHI 

represents the probability of having exactly m white balls in n trials. The generating 
functions of probabilities Pm]^ satisfy the following equations 


= ptU'^ + 

whence it can be shown that they all, as well as their sum—the generating function of 
-Pm.n—satisfy the same equation of the second order 


2n+2 - tZn^l + pq(P — l)2n = 0. 
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Setting t == one of the characteristic roots will be given by 

(1-2p9)tw-4pa(l-3pa)— + . . . 
e ^ 

for small u, while the other root tends to 0 as 0. The final conclusion can now 
be reached in the same way as in Examples 1 and 2, pages 297 and 301. 
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CHAPTER XV 


NORMAL DISTRIBUTION IN TWO DIMENSIONS. LIMIT 
THEOREM FOR SUMS OF INDEPENDENT VECTORS. 
ORIGIN OF NORMAL CORRELATION 

1. The concept of normal distribution can easily be extended to two 
and more variables. Since the extension to more than two variables 
does not involve new ideas, we shall confine ourselves to th(^ case of 
two-dimensional normal distribution. 

Two variables, x, y, are said to be normally distributed if for them 
the density of probability has the form 

where 

V? = ax^ + 2hxy + + 2dx -f 2ey + / 

is a quadratic function of x, y becoming positive and infinitely large 
together with |a:| + \y\. This requirement is fulfilled if, and only if, 

ax^ 2hxy + cy^ 

is a positive quadratic form. The necessary and sufficient conditions 
for this are: 

a > 0; ac — 6^ = A > 0. 

Since A > 0 (even a milder requirement A 9 ^ 0 suffices), constants xoj yo 
can be found so that 

V? = a(:r - xo^ + 2h(x - xo){y - yo) + c(y - z/o)" + g 

identically in Xy y. It follows that the density of probability may be 
presented thus: 

g —<p — —o(x'—xo) *—26(x—xo)(v—Vo)— c{y —i/o) * 


The expression in the right member depends on six parameters K\ 
a, by c] xoj yo. But the requirement 


f ^e~^dxdy — 1 


reduces the number of independent parameters to five. We can take 
a, by c; xo, yo for independent parameters and determine K by the condition 

^—a(z—xo)^2b(x~xo)(.v—V6)-~ciy~vo)^(l^(jly = J 
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which, by introducing new variables 


^ = X - xo, V = y - yo 

can be exhibited thus 

To evaluate this and similar double integrals we observe that the positive 
quadratic form 

+ cr]2 

can be presented in infinitely many ways as a sum of two squares 

+ 2b^rj + + 0^)2 + (y^ + Sr))\ 

whence 

a = + c = h = afi + y8 

and 

{ad — (3y)^ = A. 

By changing the signs of a and /S if n(‘cessary, we can always sui)pose 

ad — ^y = +\/A. 

Now we take 

u = a^ + IStj; V — y^ + d'n 


for new variables of integration. Since the Jacobian of w, v with respect 
to 7) is \/Aj the Jacobian of t] with respect to u, v will be 1 /\/a and, 
by the known rules 





1 Z’ «> /* v> 

—^ I I e-'^^~'>^dudv = 

\/A^ _ 00 — oo 


•v/a 


Thus 



K = 




That is, the general expression for the density of probability in two- 
dimensional normal distribution is 


\/ac — 


-a{x — xo)*— 2b(x—xo) iv—Vo)—civ—yo) ® 


2. Parameters Xo, yo represent the mean values of variables x, y. 
To prove this, let us consider 


E{x — Xo) 



_ ^'^^—a{x-~X(i)^2b(x—xo)(v~Vo)-~c{v—Vo)*dxdy, 
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To evahiate the double integral, we can express x and y through new 
variables w, v introduced in the preceding section. We have 


and 


bu — 0v 

X — Xo = - 7=r—f 

Va 


y - yo = 


-yu + av 
\/A 


E(x — xo) 


whence 
and similarly 


—r r (du — ^v)e-^"~^"dudv 
^ VAj - ocJ ~ =0 

E{x) = Xo, 

E(y) = 2 / 0 . 


0 , 


3. Having found the meaning of Xo, yo we may consider instead of x, y, 
variables x — Xo, y — yo whose mean values = 0 . Denoting these new 
variables by x, y again the expression of the density of probability for 
X, y will be: 


y/ac — 


TT 


—e 


—ax^2bxv~cy^ 


It contains only three parameters, a, 6 , c. To find the intrinsic meaning 
of a, 6 , c let us consider the mathematical expectation of (x + \yy 
where X is an arbitrary constant. We have 


E(x + X?/)' 


=^rj: 


(x + Xy) 

0 

or, introducing u, v defined as in Sec. 1 as new variables of integration, 

E{x + \yy = ^ I l_ [(S - X'y)V + 2(5 - Xt)(-0 + >^oc)uv + 

+ (/S —\a)V]e-'‘'~'''dudv = 
= ^ I I [(5 - \yy + (0 - \ay]u'^e-’‘^''dudv = 


But 

whence 


_ 52 + 02 ^^a0 + y8 , ,,7'+ 

2A 2'a 2A ■ 

52 + 02 C, 72 + a2 = a, a.0 + 75 = 6 , 


Eix^) + 2\E{xy) + y?E{y^) = ^ - 2X^ + X2^, 
and since X is arbitrary 

BV) - B{xy) - -A, £(j.) = 
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On the other hand, if ai, a- 2 , and r are respectively standard deviations 
of X, y and their correlation coefficient, we have 


Hence 

and 

or 

Finally, 
a = 


E{x^) == <r?, E{xy) = r<ria 2 , E(y^) = crl 

b 


C _ 2 ® _ 2 

2A 2A - 


2A 


= —ro'io '2 




2A = 


_ 2 

2<ri4o-'- r^y 


1 


2(rf(l - r^y 


b = - 


2cri<T2(l - r^y 


c = 


2al{l - r 2 ) 


= 


1 


2 (rio- 2 \/l — 


With these values for a, 6, c, and -s/A the density of probability can 
be presented as follows: 

■■ 2 (rb ^)[(0 “^’7.«+(«)’] 


27 rcr icr 2 -v/TT— 

and the probability for a point x, y to belong to a given domain D will be 
expressed by the double integral 


27r(7iOr2\/l 


^.. r Ce 2 (i-r «)[0 ]dxdy 

iD) 


extended over D. 
4. Curves 


2(1 


l-_r^ 2r^ ^ + (j-Y] = I = const. 
— r^)LVl/ CTi 0-2 \0'2/ J 


are evidently similar and similarly placed ellipses with the common 
center at the origin. For obvious reasons they are called ellipses of 
equal probability. The area of an ellipse corresponding to a given value 
of I (ellipse 1) is 

= 27ri<rio'2'\/l — rK 

VA 
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whence the area of an infinitesimal ring between ellipses I and I + dl 
has the expression 

27r(ri(r2's/l — 


The infinitesimal probability for a point x, y to lie in that ring is 
expressed by 


e-hil 


Finally, by integrating this expression between limits li and U > Zi, we 


as the expression of the probability for x, y to belong to the ring between 
two ellipses h and Zo. If Zi = 0 and U — Z, 


1 - 


gives the probability for x, y to belong to the ellipse Z. 

If n numbers Z, Zi, Z 2 , . . . Z„_i are determined by the conditions 

1 — = e~^ ~ = • • • = 

n + 1 

the whole plane is divided into n + 1 regions of equal probability: 
namely, the interior of the ellipse Z, rings between Z, h; Zi, Z 2 ; . . . ln- 2 , Z„_i 
and, finally, part of the plane outside of the ellipse Zn-i. 

6 . To find the distribution function of the variable x (without any 
regard to 2 /), we must take for D the domain 


— 00 < X < t] 


— 00 < 2/ < d- 00. 

As the integral 

^dxdy = 


2T<Ti<r2‘y/l 


1 rt _ r* « 1 /•< _El 

27ro-i\/l — criA/STrJ -00 


we see that the probability of the inequality 

x < i 

is expressed by 


1 rt 

v^J-. 


(TlV^. 

Similarly, the probability of the inequality 


y < t 
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is 

1 n 

-^ e 

<72V27rJ_ « 

Thus, if two variables Xy y are normally distributed with their 
means = 0, each one of them taken separately has a normal distribution 
of probability with the common mean 0 and the respective standard 
deviations ai and 0 - 2 . Variables x and y arc not independent except when 
r = 0. For if they were independent the probability of the point 
X, y belonging to an infinitesimal rectangle 


would be 


t < X < t dt) T < y < T + dr 


whereas it is 


27ra’i<r2 


_^ 


27r(ri<j2\/F ~ 


2 ( 1 - 




and these expressions arc diff(U'ent unless r = 0. Thus, except for r = 0, 
normally distributed variables arc necessarily dependent in the sense 
of the theory of ])ro])ability. De])endent variables are often called 
‘^correlated variables/^ In jiarticular, variable's are said to be in “normal 
correlation” when they are normally distributed. 

6 . The probability of simultaneous inequalities 


X < X < X\ y < t 
is represented by the repeated integral 

.1 r 

27ro-i<r2V 1 — r^jx J - « 


dy 


while 


x/^ttJx 


X' _ 

e ^^^"dx 


(Ti V^27r 

is the probability that x will be contained between X and X', Hence 
(Chap. XII, Sec. 10) the ratio 

_^ 1 r _ ^2 "I* 

are ^-yxj-„e dy 

__£L 

X e ^^^^dx 


<r2\/27r(l — r^) 

can be considered as the probability of the inequality 

y < t 
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it being known that x is contained between X and X\ Considering X' as 
variable and converging to X the above ratio evidently tends to the 
limit 


1 



e 


1 


[ 


V — r—X 



which can be considered as the distribution function of y when x has a 
fixed value X. Hence, y for x = X has a normal distribution with the 
standard deviation 


and the mean 

Y = T^X. 

Interpreted geometrically, this equation represents the so-called 
^'line of regression^’ of y on x. 

In a similar way, we conclude that for y ^ Y the distribution of x 
is normal with the standard deviation 


ai\/l — 

and the mean 


X = r^y. 

0-2 

This equation represents the line of regression of x on y. 

LIMIT THEOREM FOR SUMS OF INDEPENDENT VECTORS 

7. So far normal distribution in two dimensions has been considered 
abstractly without indication of its natural origin. One-dimensional 
normal distribution may be considered as a limiting case of probability 
distributions of sums of independent variables. In the same manner 
two-dimensional normal distribution or normal correlation appears as a 
limit of probability distributions of sums of independent vectors. 

Two series of stochastic variables 

^ 2 } • • ' 3/n 

yi, Vi, ■■ ■ Vn 

define n stochastic vectors Vi, V 2 , . . . Vn so that represent com¬ 
ponents of Vi on two fixed coordinate axes. If 

E{xi) * , E{yi) « 6 ^ 
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the vector a** with the components a*, is called the mean value of ¥<• 
Evidently the mean value of 

V = Vi 4- V2 + • • • + Vn 

is represented by the vector 

a = El + a 2 + ■ • • + En 

and that of v — a is a vanishing vector. Without loss of generality 
we may assume at the outset that 

E(xi) = E(yi) =0; ^ = 1, 2, . . . n, 

in which case E(y) = 0. Vectors Vi, V2, . . . Vn are said to be inde¬ 
pendent if variables Xiy yi are independent of the rest of the variables 
Xjy yj where j ^ i. 

In what follows we shall deal exclusively with independent vectors. 

8. As before, let Xky yk be components of the vector 

Vk(k = 1, 2, . . . n). 

Then 


X Xi “f" X 2 • -f“ Xn 

= 2/1 + ^2 + • • • + 2/n 

will be the components of the sum 


V = Vi + V 2 + • • • + Vn. 


If 


then 


because 


E{x,) 

= E{y,) ■■ 

= 0 



E{xl) 

= hki 

E{yX) 

= Ck, 

E{xkyk) = dt 


E{X) 

= 0, 

E(Y) 

= 0 

EiX^) 

= 61 + 62 

1 + • 

• * + 6 n 

= Bn 

EiY^) 

== Cl + C 2 

+ • • 

• + Cn 

= Cn 

E{XY) 

= di + dj 

! + • 

• • + dn 

= rnV^iiVCn 


Eixiyj) =0 if j i, 


variables Xi and yj being independent. 

Let us introduce instead of variables Xk, yk(k = 1, 2, . . • n) new 
variables 




Xk 

VK’ 




Vk 

y/c: 
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and correspondingly 

X Y 

VK V^n 

instead of X, Y. We shall have: 

E(^k) — E(7)k) = 0 

Eim = E(vl) = # 

tSfi i^n 

and 

E(s) = E(a) = 0 
Eis^) = E{a^) = 1 
E(se) = Vn. 

The quantity r„, the correlation (coefficient of s and (r, is in absolute value 
^1. We define 

(j)(u, v) = 

as the characteristic function of the vector 5, a. Evidently 0) and 
0(0, v) are respectively the characteristic functions of s and a. Since 

gt(u«4-V<T) _ . ^i(w^rfvi 72 ) . . . ^iCu^n+VlJn) 

and the factors in the right-hand member represent independent varia¬ 
bles, we shall have 

0(W, V) — ‘ 

9. For what follows it is very important to investigate the behavior 
of 4>{Uj v) when n increases indefinitely while u, v do not exceed an 
arbitrary but fixed number I in absolute value. 

Let 

E\xk\^ ■=/*, E\yk\^ = Qk 

and 

/l +/2 + • * • + /n _ 

^1 + ^2 + • • • + gn _ 

' CJ 

If a)n and rjn tend to 0 as n —^ oo, we shall have 

( 1 ) \<t>{u, v) - <; - 1 

provided 

\u\ ^ i, |e;| S I 
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and n is so large as to make 


+ iji) < 1. 


= 1 + + vrjk) — -^{u^k + vtikY + 


e. |<,| < 


we shall have: 




+ |-£|«£. + i»'| < 1. 


On the other hand, 


1_^^2-_ uv -= e 2C»' , 

2£,“ 2V^ 2C/ + 


+ | 0 "| < 1 


and so 


hk , 2dk Ck . 

—— ^ ■ " II 1 121) ^ ^ ^ 

£;(e«“f*+»->»)) = e 2va(„c„ 2 C. ^ j[E{u^k + + 


+ "^E\u^k + vriklK 


Furthermore, 


E(uik + VVk)^ S + 2o)\r)l + rjl) < 1 


because 


Eia) col Eivl) = ^ < 


[E{u^k + vvk^V < lE(u^k + vvk^Y ^ E\u^, + i;77,|^ 
E\u^k + VrjkY ^ 

Taking into account these various inequalities, we may write 


^(gt(ll€*+vi7*)) s= e 2y/BnCn 2 n Q ^ 



318 INTRODUCTION TO MATHEMATICAL PROBABILITY (Chap. XV 
where 


Finally, 



<i>{u, V) = + ^^)(1 + ^ 2 ) . . . (1 + 


and 


V) — J(uH-2rnUu4-tJ*) I < 1 ;^ g|«ri|+j<yj|4* • • * 4-|<rnl — 1 < — 1 

as was stated. 

10. Theorem. Let P denote the probability of shmdtaneous inequalities 

to S <i ti) To ^ O- < Ti. 


Provided r„ remains less than a fixed number a < 1 in absolute value and 
the above introduced quantities Vn tend to 0 as n <», P ca7i be expressed 
as 


P = 



1 


2(1-rn*) 


(<2-2rn<r + r2) 


dtdr + An 


where An tends to 0 uniformly in ^o, ti] tq, ti. 

//, in addition^ rn itself tends to the limit r{\r\ < 1)P will tend uniformly 
to 


1 

2wr^ 





dtdr. 


Proof, a. In trying to extend Liapounoff^s proof to the present case 
we introduce an auxiliary quantity H defined as 



Using the inequality 


—r 


e-‘'dt 



for 


a: > 0, 


one can easily derive the following inequalities: 
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if 

(3) to ^ s < ti; To ^ O’ < Ti, 

and 


(4) 


jL Te -(V)’,, < Ue-(^y + c-('T)’ + 

tJto c/TO 

_ / n — ty ’X ® __ / TO —g \ »\ 

+ e \ h J J 

if at least one of the inequalities (3) is not fulfilled. From the definition 
of n, P and from (2) and (4) it follows that 

/ _ / ^1 ^ _ / to — g \ 2 ^ / T 1 ~rr \ 2 ✓iro — <r\ 2\ 

|P - n| < |PVe V * y + e V /. ; + c K T-) + ) ), 

But referring to (1) and setting 

c4;>(„„+„) _ 1 

we have by virtue of the developments in Chap. XIV, Sec. 3, 

(f)' 


( 6 ) 


|P - n| < 2 a „(0 + hV2 + 


8 c 


■\/t 


h. Replacing h, ti by variable quantities t, r and taking the second 
derivative of IT with rcspeet to t and r, we get 


dtdr 




On the other hand 


4r’ 


-e 

TT 


^00 ^00 _ 2 1 S'! 

J — »J— « 


whence 

Here we substitute 

V) = 6-Fu*+2rnur+t,«) 

For all real u, v 

\g(u, v)| ^2. 

If \u\ S h ki ^ h where I is an arbitrarily fixed number, and n is large 
enough, we have 


\g{u, v)\ g a-nil). 
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Hence, the double integral 


J* ^ ^ v)d'udv 

extended over the region outside of the square \u\ g Z, \v\ ^ I is less than 






e ^ rdr < 


hlP 

4 


in absolute value. The same double integral extended over the square 
\u\ ^ Z, |v| S I is less than 

in absolute value. Thus, referring to (6) 

• 00 f* tc 


dm 

dtdr 


= I I ^ ^ ^ + R 


and 


hm 


\R\ < -L„(/) + 




Now 




(u«+t>*) 


= l_^(„2 + ^2); |X|<1 


and 


rs 


^i(ui+2rnuv+v*) _[_ ^^^dudv = 




< 




47r(l - r2)i ^ 47r(l - 


Hence 


and 


d^n 

dtdr 


1 /* * ^00 
47rV- 00 


m* 

Ifl'l < ^ + 4^(1 i 

By transformation to new variables 

^ = u + r„v; I? = ti\/l - '•I 
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the foregoing double integral becomes 




1 , . T — trn 

2 2 .g y/i-rn* d^dr] = 


-i- 

2xVl - rf 


SO that finally 


dtdr 27r'\/l 




Integrating this expression with respect to t and r between limits ^o, ti 
and To, Ti, we got: 


27r\/l - 


f 

1 — rlJto Jto 




where 


{hl)^ 

( 8 ) | p | < (^1 — ^( 7*1 — '^ o ) \~ 2 ^ n { l ) + ^-^2 - 1 “ 


47r(l — a^)' 


Hence combining inequality (5) with (7) and (8), 


2'7r\/1 


HJ7" 

1 — riJta Jto 


where 


|An| < 2 + ^(^1 ■“ io)(Tl — To) an(0 + <| 


C 2(1-r«') 


(A0» 

e ^ ( S h 


\v^Z 


- T„)| + /lV2 


+ (b - <o)(r. - r„)l + /lV2 + - ;") l^ 

^ 47 r(l - a 2)2 

Considering ^o, ^i; ro, ti as variable and denoting an arbitrarily large 
number by L, we shall assume at first that the rectangle D 

to ^ S ^ h; To ^ O' ^ Tl 

is completely contained in the square Q: 

|s| ^ L, ki ^ L. 

Then, taking h = l~^‘‘ we shall have 

|A.| < + + V2i"* + f 
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Given an arbitrary positive number «, we take I so large as to have 

+ 4lA + V2r^ + —.3 < L 

/ ir(l - 

After that, since a„(Z) —> 0 as n —> «> (for a fixed 1) we can find a number 
no(€) so that 

< <(2 +1^-)- 

for n > no(e). Finally, we shall have 

|A„| < € 

as soon as n > 710 (e); that is, An tends to 0 uniformly in any rectangle D 
contained in the square Q with an arbitrarily large side 2L. 

c. To prove that An tends to 0 uniformly no matter what are /o, ti) 
To, Ti we observe that the integral 


1 

27r\/l — 





dtdr 


extended over the area outside of Q becomes infinitesimal as L —> 00 . 
Accordingly, we take L so large as to make this integral <e/2 (no matter 
what n is) and in addition to have L~^ < 6/4. The number L selected 
according to these requirements will be kept fixed. 

Let D' represent that part of D which is inside Q, the remaining part or 
parts (if there are any) being D". Let P' and P" denote the probabilities 
that the point s, <r shall be contained in D' or D", respectively. Also, 
let /' and J" be the integrals 


_1 

27r\/r"^ 



1 

2(1-r«*) 


{«2-2rH<r+T»} 


dtdr 


extended over D' and D", respectively. By what has been proved, given 
e > 0 a number 7io(c) can be found so that 


for n > Tioie). Now 


IP' -J'l<€ 


P = P' -|- P"j J = J' + 

whence 

|P - J| < € + P" + J" 

for n > no(e). Since by Tshebysheff's lemma (Chap. X, Sec. 1) the 
jMTobability of either one of the inequalities 
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|s| > L or \(j\ > L 
is less than 1 /L, we shall have 


P" 



Also, 

J < 2 ' 

whence 

\P -J\< 2€ 

for n > no{e ); that is, the difference 


27r\/1 


L = f ' fV 

1 ~ Jro 




tends to 0 uniformly, no matter what U, ro, ti are. 

Finally, the last statement of the theorem appears as almost evident 
and does not require an elaborate proof. 

11. The theorem just proved concerns the asymptotic behavior of 
the probability P of simultaneous inequalities 


to ^ s < h; To ^ O’ < Ti 

which, due to the definition of 5 and o', are equivalent to the inequalities 

to^Bn ^ Xl + X2 + ' • • + Xn < ti\/Wn 
T 0 \/C^ ^ 2/1 + 2/2 + * ’ ’ + 2 /n < Ti\/^. 


From the geometrical standpoint the above domain of 5 , o- is a rec¬ 
tangle. But the theorem can be extended to the case of any given 
domain R for the point s, o-. It is hardly necessary to enter into details 
of the proof based on the definition of a double integral. It suffices to 
state the theorem itself: 

Fundamental Theorem. The 'probability for the point (s, a) to be 
located in a given domain R can be represented, for large n, by the integral 


27rVT 


MJ 




extended over R, with an error which tends uniformly to 0 as n becomes 
infinite, provided 


Un—*0, 


>>n -+0, 
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while for all n 

|rn| < a < 1. 

In less precise terms we may say that under very general conditions 
the probability distribution of the components of a vector which is the 
sum of a great many independent vectors will be nearly normal. 

The first rigorous proof of the limit theorem for sums of independent 
vectors was published by S. Bernstein in 1926. Like the proof developed 
here it proceeds on the same lines as Liapounoff^s proof for sums of 
independent variables. Moreover, Bernstein has shown that the limit 
theorem may hold even in case of dependent vectors when certain addi¬ 
tional conditions are fulfilled. 

12. A good illustration of the fundamental theorem is afforded by 
series of independent trials with three alternatives, G, For the 

sake of simplicity we shall assume that probabilities of E, F, G are 
p, g, r in all trials. Naturally 

p + g + r = 1. 

In the usual way, we associate with these trials triads of variables 
Viy (f = 1, 2, 3, . . . ) 

so that 

x* = 1 or 0 according as E occurs or fails at the ?'th trial; 

2 /i = 1 or 0 according as F occurs or fails at the ith trial; 

2 * = 1 or 0 according as G occurs or fails at the fth trial. 

Evidently 

E{xd = Eixf) = p 
E(yO = E{y\) = q 

so that vectors Vt with components 

ft = Xi - p, y]i yi — q 

have their means = 0. The independence of trials involves the inde¬ 
pendence of vectors Vi, V 2 , . . . v„. Hence we can apply the preceding 
considerations to the vector 

V = Vi + V2 + • • • + Vn 

with the components 

= fl + f2 + * • * + fn 

F = 7?! + T;2 + * • • + ??n. 

We have 


Bn = E{X^) = np(l - p); 


Cn = J?(F 2 ) = nq{l ~ g). 
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Moreover, 


and 


whence 


= E{xiy,) - pq = “P? 

EiXY) = r„VKV^. = -npq 


r„ = Pg 

Vp?(i - p)(i - 9) 

The quantities denoted by fkt gk in Sec. 9 are in our case 


h = = p(l - vY + (1 - p)p® 

gk = E\-nkV = 9(1 - gY + (1 - q)Q.^- 


Hence 


= ?>(i - vY + (1 - v)v^ 


= g(l -? )" +(!- q)q^ 

n^qi(l — q)i ^ 


and the conditions 


Wn 0, Vn 0 

are satisfied. The fundamental theorem, therefore, can be applied. 
If k, I, m are the respective frequencies of events E, F, G in n trials, the 
quantities X and Y represent the discrepancies 

X = A; — np, = I — nq. 

Introducing the third discrepancy 


V — m — nr 

we shall have 

X + M + = 0 

so that V is determined when X and p are given. The last two quantities, 
however, may have various values depending on chance. Concerning 
them the following statement follows from the fundamental theorem: 

Theorem. The probability that discrepancies X, fi in n trials shall 
simultaneously satisfy the inequalities 

aoVn < X < ai-y/n; jSoVn < p < /?i\/n 
tends uniformlyf with indefinitely increasing n, to the limit 
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where, to have symmetrical notation, y is a variable defined by 

a+P + y =0. 


On account of symmetry, perfectly similar statements can be made in 
regard to any two pairs of discrepancies X, ju, v. 

Since the fundamental theorem and its proof can be extended without 
any difficulty to vectors of more than two dimensions, we shall have 
in the case of trials with more than three alternatives a result perfectly 
analogous to the last theorem. 

Theorem. Each of n independent trials admits of k alternatives E\, 
E^y . . . Ejc the probabilities and the frequencies of which respectively are 
Pi, P 2 , . . . Pfc a.nd mi, m 2 , . . . niu. The probability that the discrep¬ 
ancies mi — npi{i = 1 , 2, . . . A; — 1 ) should satisfy simultaneously the 
inequalities 

oL^y/n < mi — npi < 0iy/n 

tends uniformly, with indefinitely increasing n, to the limit 

k 

ly <.2 

- kZTi -I * ' * I ^ ^ • • * dtk^i 

where 

4 = "~(^i + <2 + * * * + 4-i). 


From this theorem, by resorting to the definition of a multiple integral, 
we may deduce an important corollary: Lc/ Pn denote the probability of the 
inequality 

{mi - npiY {m 2 - np2Y , , {m^ ~ npkY ^ „ 

...^-p . . . -^ 

npi np2 npk 

Then, as n tends to infinity Pn tends to the limit 


If 


, 2\prp,^ 


“^K/ 


Tp^^Htidti 


dtk-i 


ib-i __ 

{2ir) 2 \/p^p2 ’ ' ' pk 

where the integration is extended over the (ifc — 1) dimensional ellipsoid 


.-5+<i+ 

Pi P2 


+ |sx>. 

Vk 


It is easy to see that the determinant of the quadratic form (p in 
(fc — 1) variables is (pip 2 • * • Pfc)'"h Hence, by a proper linear trans¬ 
formation the above integral reduces to 


(2^ 




— 2(®** + ®**+ ' ' * 


dvidvi 


dvk^ 
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the domain of integration being v\ + v\ + • • • + ^ But 

this multiple integral, as will be shown in Chap. XVI, Sec. 1, can be 
reduced to a simple integral 


Thus 




k- \ 

2 

k 


1— P 

zilA Jo" “ 






lim 

2 2 r 


1 -iu. 

77 TV I e 2 u^-^du. 


The probability Qn = 1 — of the opposite inequality 

M) (wi - npi)2 (m2 - npiY , . . , {mk - np kY 4 

'■ npi np2 npt ^ 


tends to the limit 


1 r- _1„. 


2 2 

and for large n we have an approximate formula 


Qn = 




e^~) 






but the degree of approximation remains unknown. In practice, to 
test whether the observed deviations of frequencies from their expected 
values are significant, the value of the sum (A), say x^, is found; then 
by the above approximate formula the probability that the sum (A) will 
be greater than x^ is computed. If this probability is very small, then 
the obtained system of deviations is significantly different from what 
could be expected as a result of chance alone. The lack of information 
as to the error incurred by using an approximate expression of Qn renders 
the application of this “x^-test^’ devised by Pearson somewhat dubious. 


Hypothetical Explanation of Empirically Verified Cases of 

Normal Correlation 

13, Normal distribution in two dimensions plays an important part 
in target practice. It is generally assumed on the basis of varied evidence 
collected in actual target practice that points of a target hit by projectiles 
are scattered in a manner suggesting normal distribution. By refening 
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points hit by projectiles to a fixed coordinate system on the target, it is 
possible from their coordinates to find approximately (provided the 
number of shots is large) the elements of ellipses of equal probability. 
Dividing the surface of the target into regions of equal probabilities as 
described in Sec. 4, and counting the actual number of hits in each 
region, the resulting numbers in many reported instan(*es are nearly 
equal. That and the agreement with other criteria are generally con¬ 
sidered as evidence in favor of assuming the probability in target 
practice to be normally distributed. 

Two-dimensional normal distribution or normal correlation has been 
found to exist between measurable attributes, such as the length of the 
body and weight of living organisms. Attributes like statures of parents 
and their descendants, according to Gal ton, again show evidence of 
normal correlation. 

Facing such a variety of facts pointing to the existence' of normal 
correlation, one is tempted to account for it by some more or less jfiausible 
hypothesis. It is generally assumed that deviations of two magnitudes 
from their mean values are caused by the combined action of a great 
many independent causes, each affecting both magnitudes in a very small 
degree. Clearly, the resulting deviations under such circumstances may 
be regarded as components of the sum of a great many independent 
vectors. Then, to explain the existence of normal correlation, reference 
is made to the fundamental theorem in Sec. 11. 


Problems for Solution 

1 . Let p denote the probability that two normally distributed variables (with 
means = 0) will have values of opposite signs. Show that between p and the corre¬ 
lation coefficient r the following relation holds: 


r = cos />7r. 


2. Variables x, y (with the means = 0) are normally distributed. Show that the 
probability for the point x, y to be located in- an ellipse 


^ y , y 

2r-~ - + - 


- I 


is greater than the probability corresponding to any other domain of the same area. 

3. Three dice colored in white, red, and blue are tossed simultaneously n times. 
Let X and Y represent the total number of points on pairs: white, red and white, blue 
Show that the probability of simultaneous inequalities 


7n + < X < 7n -f 7n + roV^^n < F < 7n + ti\/^ 

tends to the limit 

1 Ptl [*T\ 


i_ r 

»-V3J«o Jro 


as w --v 00. 
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4. Three dice, white, red, and blue, are tossed simultaneously n times. If k and I 
are frequencies of 10 points on pairs: white, red; red, blue; show that the probability 
of simultaneous inequalities 




n 



tends to the limit 

27r\/l20j/o Jr„ 


as n —> 00 . 

5. Two players, A and B, take part in a game arranged as follows: Each time one 
ball is taken from an urn containing 8 white, 6 black, and 1 red ball; if this ball is 

white, A and B both gain $1; 
black, A loses $2, B loses $4; 
red, A gains $4, B gains $1G. 

Let ,s‘n and an be the sums gained by A and B after n games. Show that the probability 
of simultaneous inequalities 


to\/< Sn < ro\/48w < an < 

for very large n will be approximately equal to 



-W‘ + r^-V%r)^r. 


Note; that the probability of the inequality Sna„ < 0 is about 0.13—not very small— 
so that it is not very unlikely that the luck will be with one player and against another. 

6 . Conc(u\tric circles C\, C^, C-a, ... in unlimited numbers are described about 
the origin O. Points Pi, Pa, . . . are taken at random on these circles. Let R 
be the end point of the vector representing the sum of vectors OP\^ OP 2 , OP 3 , . . . . 
If Ti, r 2 , Tz, . . . are radii of Ci, C 2 , C 3 , . . . and the condition 


_T±± rl + • — 

(r? +rl + ■ ■ ■ +r»)S 


as 


00 


is fulfilled, show that the probability that R will lie within the circle described with the 
radius p about the origin will be very nearly equal to 


for large n. 


__ 

1 — e • ■ • +rn 2 
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CHAPTER XVI 


DISTRIBUTION OF CERTAIN FUNCTIONS OF NORMALLY 
DISTRIBUTED VARIABLES 


1. In modern statistics much emphasis is laid upon distributions of 
certain functions involving normally distributed variables. Such dis¬ 
tributions are considered as a basis for various ^Hests of significance^' 
for small samples, that is, when the number of observed data is small. 
Some of the most important cases of this kind will be considered in this 
chapter. 

Problem 1. Independent variables xiy 0 : 2 , .. . Xn are normally 
distributed about their common mean = 0 with the same standard 
deviation cr. Find the distribution function of the sum of their squares 

s = xf + + • • • -h 

Solution. The inequality 


being equivalent to 


x] < t 

— < Xi < \/i, 


the distribution function of x^ is 


K(i) 


<r\/2irj- 


Vt -iL 1 

e "^^^dx = —^ 

y/l ay/ 2ir 






1 

Hu 


for 


t ^ 0 


Fi{t) =0 for < < 0. 

Hence, the characteristic function of any one of the variables x?, 
. . . xjis 


and that of their sum 


- ii)-! 


<p{t) = 


1 / 1 




(»V5)- 

Consequently, the distribution function of s is expressed by 


F{t) = C + 


2t 


T” 1 - 

J- - • / 1 

tvl K- 


p‘-itv 


T - 
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and it remains to transform this integral. To this end, imagine a variable 
distributed over the interval (0, + ») with the density 


2 2^-n _Ji n_ 


<i) 


e 


Its characteristic function is 

(.rV2)-(i ~ .<)-» 

and since the distribution function is given a priori, we must have for 
t ^ 0 


(<r\/2)-" 


ri 

Hence 


X' 


e ^du = const. + 

Ztt 


r “ 1 — c--'"' , 

I - r-i - . dV. 


F{t) = const 


1 

st. H--r-v I e du. 


The constant must be = 0 since F{t) as well as the integral in the right 
member vanishes for t = 0. The final expression is therefore: 


Fit) = 




t — 1 

e du 


for 


t > 0 


((rV2)”r( 

Fit) =0 for < ^ 0. 

The probability of the inequality 

xl + x\ + ' ' ' + xl < t, 

on the other hand, can be expressed directly as a multiple integral 

Xl^+X2*-\- ’ • ■ -{-Xn^ 




2a2 


dxidx2 


dXn 


extended over the volume of the n-dimensional sphere S 
x\ + x\ + ' ' ' xl < t. 

By equating both expressions of F(0, we obtain an important transforma¬ 
tion, 
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( 1 ) 


// /‘ 






2a> 


dxidxi 


dx„ = 


If Fixl + x| + • • • 
the integral 


I 

+ xl) is an arbitrary function of 
u = x\ + xl + ■ ■ ■ + xl 


5r2 r‘ 


t _Jf_ 


T1*-|-X22-|- 


+ -Tn2 


F{x\ + 


+ xl)dxidx2 * • * dxn 


extended over the whole n-dimensional space represents the mathematical 
expectation of F{u). On the other hand, the distribution function of 
u being known the same multiple integral will be equal to 


_1_f 


> u n — 2 

e '^^'‘F{u)u 2 du. 


Taking in particular <r = 1, F{u) = we get the formula 

* * • -+-Xn*)• • • 


( 2 ) 




dxidxo 


dXfi — 
n 

^ 2 00 W I i 77—2 

^ faul -TT- 7 

= —7-V I dUy 

J n)Jo 


^{-2 


which will be used later. 

2. Problem 2. Variables xi, X 2 , 
Denoting their arithmetic moan by 


s = 


Xl + X2 + 


Xn arc defined as in Prob. 1. 
“f* ^T» 


find the distribution function of the sum 

2 = (xi — 5)2 + (x2 — + ' ‘ * + {xn — 5)2. 

Solution. The probability of the inequality 

S < t 

is expressed by the multiple integral 
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xi^^xa^-h • • • -fxn* 

dxidxi 


extended over the volume of the n-dimensional ellipsoid 

(a:i - s)2 4- (X 2 - s)2 + - • • + (Xn - s)2 < I, 


Let 

whence 

and 


Xi S = Ui, X2 — s = U2, ’ ‘ ‘ Xn — s = Unj 

^1 + ^2 + * ‘ ‘ + Wn = 0 

xf + xi + • • • + r2 = u\ + ?^| + • * ' + 


Taking wi, 2 / 2 , . . . Wn-i, and s for new variables, we must first find the 
Jacobian J of Xi, X 2 , . . . Xn with respect to U\, u^, . . . Un-\, s. It is 


1 

1 

0 

0 • • 

• 0 


1 

1 

0 

0 • 

• 0 

1 

0 

1 

0 • • 

• 0 


1 

0 

1 

0 ■ 

• 0 

1 

0 

0 

1 • • 

• 0 

= 






1 

0 

0 

0 • • 

• 1 


1 

0 

0 

0 • ■ 

• 1 

1 

~1 

-1 

-1 • • 

• -1 


n 

0 

0 

0 • • 

• 0 


= ( —l)”"^n. 


In the new variables the expression for F{t) will be 
r* /• _ __ 

Fit) = —• • • e 2'* dsduidu 2 


and the domain of integration in the space of the new variables is defined 

by 

— 00 < s < CO 

u} + ul + • • ’ + ul_i + (wi + 2^2 + * • • + Wn-l)^ < t. 


After performing the integration with respect to s, we get 

a/w. r r r «!*+«**+ • • • +Mn» 

- V" < I <-Bi- 




dUn^l. 


The quadratic form 

+ 2^2 + • • • + + {ui + W 2 + 


+ Un-l)^ 


can be represented as a sum of the squares of (n — 1) linear forms in 
variables wi, 2 ^ 2 , . • Un-ii 


V? = t;f + + • • • + 


The Jacobian 
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Vi, .. . Vn-l) 
d(Ui, U2, . . . Un-l) 

is the square root of the determinant of the form which is the same 
as the determinant of linear forms 

1 a<p 

2 ^ + ■ ■ ■ + 

1 d 

o + * • * + Un-i 


1 dip 

2 dUn-l 


+ W2 + • • • + 2Un—U 


Now, in general 


p times 

Xll • •• 1 
1X1 • • • 1 


(X - + p - 1) 


so that the determinant of p is =n, whence 

d(Vi, V2y ' • ' Vn-l) 
a(Wi, U2, • • * Un-\) 


d{Ui, U2y ' ' ' Ur^i) ^ 1 
d{Viy V2j ' ’ ' Vn-l) V^' 

Therefore, taking Vi, 1 ^ 2 , • Vn-i for new variables, F{t) can be expressed 

as follows 

1 C C (* • • • +Pn-l* 

■ (Tvsf;/J ■ ■ ■ J "■ *'*■ ■'' *- 

where the integral is extended over the volume of the sphere 

+ vl + ' ' • + vLi < t- 

This multiple integral is exactly of the type considered in the preceding 
problem, and it can be reduced to a simple integral as follows 


JJ ■ ■ S■-‘ 


-fpag-f- • • • •fl’n-t* 

2 '* dvidVi • • • ck)„-i = 


TT 2 p _JL ; 

'~7 lY I ® 


t n —3 

e ^ du. 
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After substitution, the final expression of F{t) is 


F{t) 


_ 1 _ 



n-3 

2 du 


F(t) = 0 for t ^ 0. 


for 


t > 0 


3. Problem 3. Variables Xi, X 2 , . , . Xn are defined as in Prob. 1. 
As in Prob. 2, we set 

^1 + ^2 + * * * + 

5 —- 

n 

Ui = x^ - 2 = 1, 2, ... n 

and introduce the quantity 


€ 




Ul + + • • • + U 

n 


2 

n 


What is the distribution function of the ratio 


s 

€ 


or, which is the same, the probability F{t) of the inequality 

5 < ^€? 


Solution. First, assuming t to be positive, let us find the probability 
of the inequality 


S ^ te 


or 


u\ + ul+ • • ' +ul^ 


ns^ 


This probability can be presented in the form 

77 

= 7—7=7- I e 2-''I'(s)ds 

(aV^yJo 

where the multiple integral 


-ff f’ 




+ Un* 


2o-» 


duidut 


dUn-i 


in which 


Un = -(Ui + Uz + • ‘ + M„_i) 
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is extended over the domain 

• • • + + {Ul + ^2 + ' ‘ ‘ ^ 

Proceeding in exactly the same manner as in Prob. 2, we can transform 
^(s) into 




extended over the sphere 




dvidv2 * • * dVn-l 


v\-\-vl + ' ' ‘ + v\_^ s 


in the space of the variables V\, v^, . . . For this multiple integral 

we can substitute a simple integral 


r) - 1 ns^ n -1 n — 1 s 

^ 27r 2 n 2 C t 

^ -^ 2 du = I e 

n - 1 \Jo 


^2^2 r~t _!lp 

i^r 


and thus reduce ^(s) to the form 


w — 1 n —2 8 


-!•(») = pe'n--’* 

After substitution we can express <^(i) as a repeated integral 

*(,) -_^ 

VJ(.V2)-r(tL^)-'» 

The derivative of </>(/) is 







338 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XVI 

whence 


0(<) = C 


v^r(rL^> 


“(1 + 2^)2 




/: 


so that C = 1 and 


4,{J.) = 1 - 


(1 + 


_f 

n - 1\ J- . 




v4r(!L^) 


(1 + 22)2 


Such is the probability of the inequality 

8 ^ t€. 

The probability F(t) of the inequality 


s <U 


will be 1 — <l>{t) or 


Fit) = 




a /:. 


(1 + 22)2 


but this is established only for positive t. However, this result holds 
for negative t as well. For t being negative = — t the inequality 


8 < —T€ 


is entirely equivalent to 


— 5 > rc 


and its probability is evidently 


F{-r) 


=♦("•) -1 - r (1+*’) 
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-f" (1 + 2=) '^dz = 1 

which permits of writing the preceding expression for F{—t) as follows 

r ( 


F(-r) = 






(1 + 2^) 2d2 = 


< 1 ) 


v^r(M) 


aX: 


(1 + z^) Mz. 


Thus, no matter whether t is positive or negative, the distribution func¬ 
tion of the ratio 


or the probability of the inequality 

s < te 

is given by 



The distribution of the quotient s/e was discovered by a British 
statistician who wrote under the pseudonym ‘^Student,” and it is com¬ 
monly referred to as ‘^Student^s distribution/^ The first rigorous proof 
was published by R. A. Fisher. 

4 , Problem 4 . Variables x, y are in normal correlation. A sample of 
n corresponding pairs, Xi, yu ^2, t/2; • . . x^y yn is taken a^d the correla¬ 
tion coefl&cient of the sample” is found by the formula 

^ — s){yi - s') 

^ V Xixi - sy • 2(2/,* - s'y 
where, for the sake of abbreviation, 
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Find the distribution function of p, that is, the probability P of the 
inequality p < ^ for a given i^( —1 < JB < 1). 

Solution. Since the expression of p is homogeneous of degree 0 in 
xi,X 2 , . . . Xn;yuy 2 j . • • i/n we can assume o-i 0-2 == 1. Also without 
loss of generality the expectations of x and y may be supposed — 0. 
Denoting by r the correlation coefficient of x and y, the density of proba¬ 
bility in the two-dimensional distribution will be: 


1 2r.,) 

27r(l - r^-)i 

Hence the required probability will be expressed by the multiple integral 


P = 


(27r)'‘(l - r2)2 


-JJ / 




dXndy 1 


dyn 


extended over the 2/i-dimensional domain 


( 3 ) lixi - s){yi ~ s') < R\/l>(xi 5)2 • Ziyi — s')^ 
and 

( 4 ) = ^xf + 2 ^? - 2rZxiyi. 

Replacing ?/i(z = 1, 2, . . . n), respectively, by \/l “ 
we can write P thus: 


^ "" ^27r^ f f ' dy^ 

while (3) and (4) still hold but wdth the new notation for the variables. 
Let us set now 

Xi — s = Ui, yi — s' = Vi, 

then 

-f- "^2 + * ' ’ + Wn == 0, + 2^2 + * * * + = 0. 

Introducing s, s'; Wi, W 2 , • . . Un-i ;vi,V 2 y . . . z^n-i as new variables, we 
find as in Sec. 2 


p _ n^il — 
^ “(2^0 


JJ / 




e 2 dsds'dui 


dUn-ldVi 


dVn- 


where 


^ + ns'2 - 2nr8s' + -f- Xvf - 2rZ:uiVi 
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and the domain of integration is defined by the inequalities 
— cc)<S<oo; — co<s'<oo 

Now by the same linear transformation the quadratic forms S?/?, 
(each containing n — 1 independent variables) can be transformed 

into 

n — 1 71-1 

X'u’f, 

i^l t = l 

at the same time 

n 71 — 1 

^UiVi = ^WiZi. 

7=1 7=1 


Proceeding as in Sec. 2 and noting that 





we find that 


27r 


i\/l 


P = 


n-l 


( 1 -r^) ^ r r 

(27r)-^ jj 


r --X 

I e 2 dwi • ‘ • dwn-idzi • • • dzn-i 


where 


X = + Xzf - 2rZwiZi 


and the domain of integration in the space of 2n — 2 dimensions is defined 
by 

'^WiZi < R\^ Stef • 

We shall integrate now in regard to variables Zi, Z 2 , . . . Zn-i for a fixed 
system of values Wi, 1 ^ 2 , • • • To this end we use an orthogonal 

transformation 


Zl = Ci.ifi + Cl,2^2 + ' * ' + 

Z 2 == C2,lfl + C2,2f2 + ‘ ‘ + C2,n-lfn-l 


Zfi—l — Cn—l.lifl ~i” Cn—1,2^2 “1“ “f“ Cn—l,n—ifn—1 

in which the elements of the first column are 

_ Wi _ 

“ Vw\ + • • • -f^Ti “ 
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Defining {i, {2, . . . in-i by 

Wl = Ci.ifi + Cl,2^2 4* * * * + Ci,n-l{n~l 
W 2 = C2,lfl + € 2 . 2^2 + * * * + C2.n~3fn-1 


we shall have = ic, £2 = * • = = 0. By the properties of 

orthogonal transformations 

Xzf = XziWi = Xtiii - w^i 


so that for a fixed system of values Wi, W 2 f . . . Wn~i the domain of 
integration in the space of variables fi, f 2 , . . • fn-i will be 

(6) fi < J?V^. 

Thus we must first evaluate the integral 

7 = // . . . * * * dfn-i. 

If < 0 no restriction is imposed upon ^ 2 , . . . fn-i) if Ti > 0, then 


fi + • • • + fLi > 



Consequently the result of integration in regard to f 2 > • • . fn-i can be 
presented thus: 


r»tg» 

/ = ce 2 


/; 








+r»-x*) 


df2 


dfn-l 


where the inner integral is extended over the domain 


ri + • • • + fLi 



and c is a constant. Making use of formula (1), Sec. 1, the expression of 
J reduces to 


J = ce ^ — 


n-2 


2ir 2" f 



€ 


^i»4-rwri 






^v^~^dv. 


This has to be multiplied by 


and integrated over the whole space of the variables ici, tC 2 , . . . tCn~i. 
The resulting expression for P will be 
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P = const. 


where 


n-2 

2t 2 (1 


{2^r 




Mdw\ 


dWn~\ 


M 




- !si^ 1 rwn 






r 2 

e ^v^'^dv. 


Now we differentiate in regard to reverse the order of integrations, 
and make use of formula (2), Sec. 1 ; the resulting value of dP/dR will 
then be expressed as a double integral 


dP 

dR 


1 ri - 1 n- 4 

TT Hi - r2) ^ (1 - li^)^ p p -^(e + u'0+/?r<u 

V-^(” -i) J” ' 


(iw)"" hltdu, 


or 


dP - r^) ^ (1 - R^y 2 
dR irr(n - 2) 


ri: 




Since 


In the double integral we make transformation to new variables f, tj 
defined by 


^ “ u 


r) ~ tu. 

The Jacobian of u in regard to 77 , being we have 

u^)-^Rrtu 


n " -hp+uK 


= ir 


J f* 00 

[ 


{tuY'^^dldu 


=‘/;x 


- -) 


0 i 

^ = r(n - 1) ' 


>X' 


(c/i< - jfiV)’*-!’ 


and so, finally, 

^ n - 2 
dR 


n — 1 n — 4 ^ ao 

(1 - r 2)“(1 ~ K^) 2 


dt 


{cht — RrY~ 


In case r = 0 , that is, when the variables x, y are uncorrelated, we have 
a very simple expression of P : 
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P = 




In case r 9 ^ Q the integral 


{chi — 

can still be found in finite form. We have, in fact, 


X' 


dt 


^ + arc sin {Hr) 


]• 


whence 


X' 


cht - Hr Vl - /^V\2 


dt 


]}' 


and so 


!Lzi(7n-2 ( 1 ) 

P = Aj^^H - p2) 2 ^,^[1 - P^rT^[l + arc sin (rp) j|dp, 


where 


A = 


n — 1 

r- (»-2)(i ;.2) — 

Tr(n — 3)! 


When n is an even number, this integral appears in a very simple finite 
form, but in case of an odd n certain integrals of a rather complicated 
type appear. Besides, the behavior of P for somewhat large n cannot 
be easily grasped by using this integral expression for P. 

6. Fisher, who was first to discover the rigorous distribution of the 
correlation coefficient, called attention to the fact that, setting 

thz = - jm LzJL=, 

Vs(x< - sYUvi - s'Y 

the distribution of z will be nearly normal even for comparatively small 
values of n. Let us set thH = oj, th^ = r; then P can be expressed thus: 


p ^ n -2 p r ” chzdidz _ 

T J- 00 Jo {chtchzch^ — sh^8hz)^~^ 

Instead of t it is convenient to introduce a new variable r so that 


chtchzch^ — sh^shz = T^^ch{z — f). 
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Then 

p _ n - 2 T" /chzy dz fV-Hl - ry-^dr 

Tv/2j_-W vT^ pr 

where 

^ ™ (?_ ihJ!} < ~f~ f) 

^ 2chzch^ ^ 2cho)ch^ 
for all values of z under consideration. Now 


and 


since 


rV '*(1 — r)" ^dr 
jo \/l — pr 


■s/rl'in — 1) 

~r(n -w 


i 

I 


h-i(l - tY-Ht 
y/i - pr 

h-i(l - T)" -^dr 
\/l — pr 


V ^r(n - 1)/ P \ 

r(n - i)'\ "^2^ - y 

< |* T~i(l — t )’‘~'''(1 + pT)dT 


for 0 < p < 1 as can be easily verified. Consequently 


p — (n — 2)r(n — 1) T" /chz\^ dz 

V^r(n J> «~ f)]^ 

. r 1 _ir. ^ 1. 

[ 2cha)ch^ 2n — ij’ 


0 < 0 < 1. 


As to the integral in this formula, its approximate expression, omitting 
terms of the higher order, is: 





2n - 3® 


Thus for somewhat large n the required value of P can be found with 
the help of a simple approximate formula. 

The various distributions dealt with in this chapter are undoubtedly 
of great value when applied to variables which have normal or nearly 
normal distribution. Whether they are always used legitimately can 
be doubted. At least the ^^onus probandi^^ that the ‘‘populations^^ with 
which they deal are even approximately normal rests with the statisticians. 


Problems for Solution 


1. Show that 


Urn 

n—► •• 


n 

-*!_ f 


•Wl - 




du 


1 

\/ 2ir( 


j: 


'^du 


Htnt: Liapounoff’s theorem and Prob. 1, page 332. 
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2. With the same assumptions and notations as in Prob. 3, page 336, show that the 
distribution function of the quotient 



i = 1, 2, . . . n 


is 



It is worthy of notice that for n = 4 the distribution is uniform.^ 

3. In two series of observations, samples a*i, 2 : 2 , • x,, and i/i, 1 / 2 , • Vn' from 

the same normally distributed population (or of the same normally distributed vari¬ 
able) are obtained. Denoting for brevity 



find the distribution function of the quotient-^ (“Student”). Ans. 
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APPENDIX I 


1 . Euler^s Summation Formula. Let f(x) be a function with a 
continuous derivative f{x) in an interval (a, h) where a and b > a are 
arbitrary real numbers. The notation 

n 

%f(.n) 

n >a 

will be used to designate the sum extended over all integers n which are 
> a and ^ h. It is an important problem to devise means for the approxi¬ 
mate evaluation of the above sum when it contains a considerable number 
of terms. 

Let [x], as usual, denote the largest integer contained in a real number 
x, so that 

j = [j] + 0 

where $, so-called ‘‘fractional part'’ of x, satisfies the inequalities 


0 ^ e < 1. 


Considered as functions of a continuous variable x, both [x] and 6 have 
discontinuities for integral values of x. The function 

p(a:) = l- 0 = [x]-x + ^ 

is likewise discontinuous for integral values of x. Besides, it is a periodic 
function of x with the period 1; that is, we have 

p(x + 1 ) = p{x) 

for any real x. With this notation adopted we have thr "allowing 
important formula: 

n ^6 

( 1 ) Xfin) = f%)dx + Pib)m - P(a)/(a) - £p{x)f’ix)a. 

n >a 

which is known as “Euler's summation formula." 

Proof. Let k be the least integer >a and I the greatest integer ^b. 
The sum in the left member of (1) is, by definition, 

m +f{k + i) + • • • +m 

and we must show that this is equal to the right member. To this end 
we write first 
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j-i-i 

fy{x)f'ix)dx = j‘%{x)f'(x)dx + £p{x)S'{x)dx + '^ ^\(x)f'(x)dx. 

>=.* ^ 

Next, since j is an integer, 


-x+ iy 


X +1 )/'(»)* - -M±^j+±) + 


rj+1 

+ J f(x)dx 


and 


i-i-i 


n=i-l 


2 f(n) -f J^f(x)dx. 

j""k n=A: + l 


On the other hand, 

£pix)rix)dx = -1-X+ ^f'ix)dx = - p(a)/(o) + 

+ J* f(x)dx 

p{x)S'{x)dx = - a: + 0/'(x)dx = -^ + p(i!))/(6) + J^/(x)dx, 

SO that finally 

£p(x)f'(x)dx = -/(A) -/(* + !)- • • • - /(/) + 

+ p(6)/(6) - p(a)/(a) + f%)dx; 

whence 

n ^6 

+ piW0>) - p(o)/(a) - f%{x)f'{x)dx, 

n >a 

which completes the proof of Euler^s formula. 

Corollary 1. The integral 

fjp(z)dz = (r(x) 

represents a continuous and periodic function of x with the period 1. For 
<t{x + 1) — <r(x) = J^^^p{z)dz == J^p{z)dz = = 0. 

If 0 ^ X ^ 1, 
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■ X G ■ ~ 

and in general 


where B is a fractional part of x. Hence, for every real x 

0 ^ (t{x) ^ 1. 

Supposing that/"(^) exists and is continuous in (a, h) and integrating by 
parts, we get 

f%{x)f'ix)dx = <T{b)f'{b) - a{a)f'{a) - £a{T)f"{x)dx, 

which leads to another form of Euler’s formula: 

n gb 

XS{n) = fj{x)dx + p(5)/(b) - p(a)/(a) - <r(b)/'(b) + 

n >a 

+ (T(n)f'(a) + J^(T(x)f'{x)dx. 

Corollary 2. If f{x) is defined for all o' ^ a and possesses a continuous 
derivative throughout the interval (a, +oo); if^ besides, the integral 

“p(x)/'(x)dx 

exists, then for a variable limit h we have 

n ^6 

(2) Xf(n) = C+ff{b)db + p{b)f{b) + £’°p{x)f{x}dx 

n >a 

where C is a constant with respect to b. 

It suffices to substitute for 

£p(.x)f'(x)dx 

the difference 

Jj"p{x)f'{x)dx - p{x:)f(x)dx 

and separate the terms depending upon b from those involving a, 

2. Stirling’s Formula. Factorials increase with extreme rapidity 
and their exact computation soon becomes practically impossible. The 
question then naturally arises of finding a convenient approximate 
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expression for large factorials, which question is answered by a celebrated 
formula usually known as Stirling’s formula,” although, in the main, 
it was established by de Moivre in connection with problems on proba¬ 
bility. De Moivre did not c'stablish the relation to the number 

TT = 3.14159 . . . 


of the constant involved in his formula; it was done by Stirling. 

In formula (2) it suffices to take a = ^ 2, /C*^) == replace h 

by an arbitrary integer n to arrive at the remarkable expression 


log (1 • 2 • 3 ■ • • n) = C' + + 2 ) n — n + J 

where C is a constant. For the sake ()f brevity we shall set 

m)n ^ 


'p(x)dx 


Now 


and 


pp Wdx _ C’' + ^p{x)dx ^ 

Jn ^ Jn X Jn + 1 


'p(x)dx 


+ 


_ C^ p{\i)du _ I C^p(y)du _ 

Jk X ~ Jo u + k " jo U + k V + k 

_ THi “ u)du — u)dM. _ 1 (1 ~ 2uydu 

Jo u + k Jj u + k 2 J 0 (/b + w)(A: + i — z 


Hence 


where 


Since 


w(n) = if^\l - 2u)W„{u)du 


^niu) = 2 


1 


{k + u){k + 1 — w) 


(k + u){k + 1 — u) — k(k + \) -\- u — 

it follows that for 0 < w < 

{k + u){k + 1 — u) > k{k + 1) 

{k + u){k + 1 - m) < (fc + ^)' < (fc + ^)(k + I). 

Thus for 0 < M < 
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F„(u) < 2 
00 

F„(«) > 2 


A —n 


1 ^ 1 
k(k + 1) n 


k«=n 


{k + ^){k +1) 71 + i 


Making use of these limits, we find that 


o)(n) < 


2 ^ 1 '“ - 


2uydu = 


12n 


and consequently can set 

1 

a?(n) = 

where 

Accordingly 


12(n + B) 

0 < e < i. 


log (1 • 2 • 3 


n) = C + 


(”+0 


S) log n ~ n + 


12(?i -}“ S') 


The constant C depends in a remarkable way on the number tt. 
To show this we start from the well-known (expression for tt due to Wallis: 


i = (f 


2 2 

4 

4 

2n 

vrs 

'3 

5 

2n — 1 


2n 


i)' 


which follows from the infinite product 

by taking x = 7r/2. Since 

2n 2n 


2 2 4 4 
l'3‘3'5 


2n — 1 2n + 
we get from Wallis’ formula 

2 • 4 • 6 • • • 2n 


r 2 ■ 4 • 6 ■ • 2 w y 

1 [l • 3 • 5 • (2n - i)J 


1 


2n + 1 


■\/ir = lim j^j— 
On the other hand, 


3-5 


(2n - 1) 




2-4-6 • 


2n = 2" • 1 • 2 • 3 
1.3 • 5 • • • (2n - 1) = 2^1.2 - 3 


n • 


2n 
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so that 


^ (22"(1.2 • 3 ■ • ■ ny 1 ) 

^ { 1.2.3 “ 

or, taking logarithms 

log -v/tt = lim [2n log 2 + 2 log (l*2-3--n) — 

— log (1 • 2 • 3 • • • 2n) — i log n ] 

But, neglecting infinitesimals, 

log (1 • 2 • 3 • • • n) = C + + i) log n — n 

log (1 • 2 • 3 • • • 2n) = C + (2n + J) log 2n — 2n 

whence 

lim [2n log 2 + 2 log (l‘2-3*-n) — 

- log (1 • 2 • 3 • • • 2n) - i log n] - C - log 2. 

Thus 

log\/TT = C -- J log 2, C = log \/27r 

and finally 

(3) log (1.2.3 . . . n) = log \/27r + ^ log n - n + 


+ 


1 


12(n + e)’ 


0 < » < 2 


This is equivalent to two inequalities 


ei2n+6 < ^ gl2„ 

V 2irra n"e~" 

which show that for indefinitely increasing n 

1 • 2 • 3 • • • n ^ 

V 27rn 

This result is commonly known as Stirling's formula. 
For a finite n we have 


where 


The expression 


1 • 2 • 3 • * • n = \/ 2Trnn^e^ • 


1 

12(n + i) 


< win) 
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is thus an approximate value of the factorial 1 • 2 • 3 • • • n for large n 
in the sense that the ratio of both is near to 1; that is, the relative error is 
small. On the contrary, the absolute error will be arbitrarily large for 
large n, but this is irrelevant when Stirling's approximation is applied 
to quotients of factorials. 

In this connection it is useful to derive two further inequalities. 

Let m < n; we have, then, 


Fm{n) - Fn{u) 


{k + u) {k -{- 1 — u)^ 


and further, supposing 0 < < ^ 2 . 


F^{u) - < 'y^ m-4-n = --- 

k{k + 1) m n 

k=^ m 
k = n— 1 

Fm{u) - F„{u) > 2 + I) = - TT+i' 

k = Tn 

Hence, 

a,(m) - co(n) < ^ a,(m) - o,(n) > " lYCnVl) 

and, if I is a third arbitrary positive integer, 

«(m) + 0,(0 - o,(n) < ^ ^ 

o,(w) + co(0 - coin) > + i2(iryrT) “ f^^T+T)’ 

3. Some Definite Integrals. The value of the important definite 
integral 

can be found in various ways. One of the simplest is the following; Let 


- /.•- 




in general where n is an arbitrary integer ^0. 
can easily establish the recurrence relation 

T _ ^ 7 

«/ n c\ 2 y 


Integrating by parts one 


2 
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whence 

, 1.3 • 5 • • • (2m - 1) • 

^ «/o 

^ 1.2‘3 •• • m 

v 2frt-4-l c% 


On the other hand, 

+ 2XJ„ + X=>7„_i = 4- X)*£«, 

which shows that 


«7n4-l 4" 2\Jn 4“ \^Jn—l ^ 0 

for all real X. Hence, the roots of the polynomial in the left member are 
imaginary, and this implies 

Jn ^ n+lt/n-1* 

Taking n = 2m and n = 2m 4- 1 and using the preceding expression 
for J^m and we find 

2-4-6— ^m 1 ^ ^ 2-4-6‘--2m 1 

1 “3.5 ... (2m - 1) V4^^r+2 ' ' 1 • 3 • 5 • • • (2m - 1) 

But 


hence 


lim 


2 4-6. 


2 m 


1 . 3.5 . . . ( 2 m 1 ) 


= V^; 


»/o = 

Here substituting i — y/au^ where a is a positive parameter, we get 

As a generalization of the last integral we may consider the following one: 

y = I cos budu. 

Jo 

The simplest way to find the value of this integral is to take the derivative 


dV 

db 



g-a«* gjjj 5^ . 


and transform the right member by partial integration. The result is 
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or 


whence 


^ = _Af 

dh 2a 

d(Fe4a) ^ 0, 

V = Ce 


To determine the constant C, take 5 = 0; then 

c=(F)„ - 


so that finally 


r 


1 Itt -- 
6“““' COS hudu = -e 
2\a 


The equivalent form of this integral is as follows; 


j: 


cos budu 


= J* 


4o^ 



APPENDIX II 


METHOD OF MOMENTS AND ITS APPLICATIONS 


1. Introductory Remarks. To prove the fundamental limit theorem 
Tshebysheff devised an ingenious method, known as the method of 
moments/’ which later was completed and simplified by one of the most 
prominent among Tshebysheff’s disciples, the late Markoff. The 
simplicity and elegance inherent in this method of moments make it 
advisable to present in this Appendix a brief exposition of it. 

The distribution of a mass spread over a given interval (a, b) may be 
characterized by a never decreasing function (p(x), defined in (a, b) 
and varying from (p{a) == 0 to <p(b) — mo, where mo is the total mass con¬ 
tained in (a, b). Since (p{x) is never decreasing, for any particular point 
xo, both the limits 

lim <p{xo — e) = <p{xo — 0) 
lim (p{xo + €) = <p{xo + 0) 


exist when a positive number e tends to 0. Evidently 


(p{xo — 0) ^ <p{xo) S (p(xo + 0). 

If 

(p(xo — 0) = <p(xo + 0) = (p(xo), 


then Xq is a point of continuity” of (p{x). In case 


<p{pc{) + 0) > ^(xo — 0), 

xo is a point of discontinuity of ^(x), and the positive difference 

<^(xo + 0) - < p(xq — 0) 

may be considered as a mass concentrated at the point Xo. In all cases 
^(xn — 0) is the total mass on the segment (a, xo) excluding the end point 
Xo, whereas <^(xo + 0) is the mass spread over the same segment including 
the point Xo. 

The points of discontinuity, if there are any, form an enumerable set, 
whence it follows that in any part of the interval (a, b) there are points of 
continuity. 

If for any sufficiently small positive e 

<p{Xq + €) > (P(Xq - €), 

Xo is called a point of increase” of ^(x). There is at least one point of 
increase and there might be infinitely many. For instance, if 
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<p{x) — 0 for a ^ X ^ c 

<p{x) = wio for c < X Sh, 

then c is the only point of increase. On the other hand, for 

/ \ ^ ~T~ 

(p{x) = mo- - 

b — a 


every point of the interval (a, b) is a point of increase. In case of a 
finite number of points of increase the whole mass is concentrated in 
these points and the distribution function (p^) is a step function with a 
finite number of steps. 

Stieltjes’ integrals 

= ^ 0 , £'xd(p(x) = mi, • ‘ • J\^d(p(x) = rrti 

represent respectively the whole nTass mo and its moments about the 
origin of the order 1, 2, ... f. When the distribution function ip{x) 
is given, moments mo, mi, m 2 , . , . rui (provided they exist) are deter¬ 
mined. If, however, these moments are given and arc known to originate 
in a certain distribution 6f a mass over (a, 6), the question may be raised 
with w^hat error the mass spread over an intt^rval (a, x) can be determined 
by these data? In other words, given mo, mi, m 2 , . . . m.,, what are the 
precise upper and lower bounds of a mass spread over an interval (a, x) ? 
Such is the question raised by Tshebysheff in a short but important article 
^*Sur les valeurs limites des int6grales^^ (1874).^ The results contained 
in this article, including very remarkable inequalities which indeed are of 
fundamental importance, are given without proof. The first proof of 
these results and the complete solution of the question raised by Tsheby¬ 
sheff was given by Markoff in his eminent thesis ‘^On some applications 
of algebraic continued fractions^^ (St. Petersburg, 1884), written in 
Russian and therefore comparatively little known. 

Suppose that p* is the limit of the error with which we can evaluate the 
mass belonging to the interval (a, j) or, which is almost the same, the 
value of (p(x)y when moments mo, mi, m 2 , , . . rrii are given. If, with i 
tending to infinity, pi tends to 0 for any given x, then the distribution 
function <p{x) will be completely determined by giving all the moments 


mo, mi, m2, .... 

One case of this kind, that in which 

1 . 3 • 5 • • • (2/c - 1) 

rtiQ = 1 , m2k = —- ^ -’ 


ni2k+i = 0 


^ Jour, LiomnUey Ser. 2, T. XIX, 1874. 
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was considered by Tshebysheff in a later paper, ^‘Sur deux th^orSmes 
relatifs aux probabilit^s^^ (1887)^ devoted to the application of his 
method to the proof of the limit theorem under certain rather general 
conditions. The success of this proof is due to the fact that moments, 
as given above, uniquely determine the normal distribution 


<p{x) = 



e~^''du 


of the mass 1 over the infinite interval ( — oo, +<»). 

After these preliminary remarks and before proceeding to an orderly 
exposition of the method of moments, it is advisable to devote a few pages 
to continued fractions associated with power series, for continued frac¬ 
tions are the natural tools in questions of the kind we shall consider. 

2. Continued Fractions Associated with Power Series. Let 


0(2) = iLi + ill + _|_ 

2“^ 


(Ai 9^ 0) 


be a power series arranged according to decreasing powers of z where the 
smallest exponent ai is positive. We consider this power series from a 
purely formal point of view merely as a means to form a sequence of 
rational fractions 

Ai, Ai _i_ A.^, Ai ^ Ai -L- Ai, 


and we need not be concerned about its convergence. 

Evidently l/<t>iz) can again be expanded into power series, arranged 
according to decreasing powers of z. Let its integral part, containing 
non-negative powers of be denoted by qi{z)y and let the fractional part 

^ ^ , 

2/31 2 / 5 . 


containing negative powers of z, be denoted by so that 

1 


0 ( 2 ) 


= g,(^) - 


In the same way 




can be represented thus: 




q2{z) - <t>2{z) 


1 Oeuvres completes de P. L. Tshebysheff, Tome 2, p. 482. 
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where ^ 2 ( 2 ) is a polynomial and 


A (iS — I I ^3 I 


a power series containing only negative powers of 2 . Further, we shall 
have 


1 


= </3(2) - <^> 3 ( 2 ;) 


with a certain polynomial ^ 3 ( 2 ) and a power series 


^ 4_ ^2 , Da , 

+ -^2 + "IT. + 


25s 


containing negative powers of 2 , and so on. Thus we are led to consider a 
continued fraction (finite or infinite) 


( 1 ) 


1 

q2-- 

Qz 


associated with <i>{z) in the sense that the formal expansion of 


1 


- 


_1 

?2 — • 


J. 

qi - 


into a power series will reproduce exactly (>( 2 ). The continued fraction 
(1) is again considered from a purely formal standpoint as a mere abbre¬ 
viation of the sequence of its convergents 


^2=1 1 . = 1 

Q2 qx-^^ Qz qi-- 

q, q, - - 

The polynomials 

Ply PPZy • • • 

Qly Qzy Qzy • • • 

can be found step by step by the recurrence relations 


Pi = qiPi^i - Pi-zh 

Qi ~ qiQir-X Qi—zj 

Pi * 1, Po = 0 

Qi = qiy Qo = 1 


= 2, 3, 4, . 


( 2 ) 
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from which the following identical relation follows: 

(3) Piiz)Q,..(z) - Qi{z)P,^i(z) = h 

showing that all fractions 

Pi(z) 

Qi(z) 

are irreducible. Evidently degrees of consecutive denominators of 
convergents form an increasing sequence and the degree of Qi{z) is at 
least i. Since 


Qi 


1 

- . 


we can write 


Pijqi^i — (l>i+\{z)) — Pj-i 

Qi{qi+l Qi—1 


Qi+l <t>i-^l{z) 

_ Pj+l Pi(l>i+liz) 

Qi-hl 


<j){z) = 


Pt- 4.1 — Pi<l>i+l{z) 

Qi+1 


in the sense that the formal development of the right-hand member is 
identical with <t>(z). By virtue of relation (3) 

Pi _ 1 

Qi w^- Qi<t>^iy 

The degree of Qi being X» and that of Qi+i being \i^i, the expansion of 

Qi(Qi+i — Qi<t>x-\.0 


in a series of descending powers of z begins with the power 
Hence, 


<t>{z) - 


Pi 

Qi 


M 

^Xi-fXi+i 


+ • • • 


and, since X,+i S X, + 1, the expansion of 


<t>{z) 


Qi 


begins with a term of the order 2X< + 1 in 1/z at least. This property 
characterizes the convergents Pi/Qt completely. For let P/Q be a 
rational fraction whose denominator is of the nth degree and such that 
in the expansion of 


<t>iz) 


P 

Q 
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the lowest term is of the order 2n + 1 in 1/a at least. Then P/Q coincides 
with one of the convergents to the continued fraction (1). Let i be 
determined by the condition 


Then 


<i>{z) - 


\i ^ n < X<+i. 

Qi 




P N 

- Q = + 


whence in the expansion of 


Q Qi 


the lowest term will be of degree 2n + 1 or \i + Xi+i in 1 /z. Hence, the 
degree of 

PQi - PiQ 

in z is not greater than both the numbers 

Xt — n — 1 and n — X^^i 
which are both negative while 

PQi - PiQ 

is a polynomial. Hence, identically, 

PQi - PiQ - 0 

or 

P ^Pi 
Q ( 5 ^ 


which proves the statement. 


3. Continued Fraction Associated with 


ith 

JaZ - X 


Let (p(x) be a never 


decreasing function characterizing the distribution of a mass over an 
interval (a, h). The moments of this distribution up to the moment of 
the order 2n are represented by integrals 

mo = ~ J^xd<p(x), 

m% = ^xH<p{x)^ • • • m2n = p^x^d<p{x). 
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I^et 


Ao = my Al 


mm 

mimi 


; Af = 


momim2 

mim2m ; 
m2mzmi 


• * An 


\mm * * • m 

\1n1m2 * • * m„+i 


\mnmn+i • • • m2n 


If <p{x) has not less than n + 1 points of increase, we must have 


Ao > 0, Al > 0, • • An > 0, 

and conversely, if these inequalities are satisfied, (p{x) has at least n + 1 
points of increase. To prove this, consider the quadratic form 

<t> = J\to + tiX + • • ‘ + tnx^yd(p(x) 

in n + 1 variables to, Evidently 

<j) = 'Imi+iiitj {i, j = 0, 1, 2, . . . n) 

so that An is the determinant of <i> and Ao, Ai, . . . An_i its principal 
minors. The form <t> cannot vanish unless io = ^1 - • • • ^ tn = 0. 
For if X = { is a point of increase and (#> = 0, we must have also 

+ • • • + tnX^yd(p(x) = 0 

for an arbitrary positive e, whence by the mean value theorem 

(to + hrj +••• + = 0({~€<T7<{+€) 

or 


to tiTJ • ’ • 4 " tnV^ = 0 

because 

f'y<p(x) > 0. 

Letting e converge to 0, we conclude 

«0 + <1 { + • • • + = 0 

at any point of increase. Since there are at least n + I points of increase 
the equation 

to + tiX +••• + tnX^ = 0 

would have at least w + 1 roots and that necessitates 
^ 3s s . . . as s= 0. 
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Hence, the quadratic form 4), which is never negative, can vanish 
only if all its variables vanish; that is, </> is a definite positive form. Its 
determinant An and all its principal minors An_i, An_ 2 , . . . Ao must be 
positive, which proves the first statement. 

Suppose the conditions 

Ao > 0, Ai > 0, . . . An > 0 

satisfied and let ip{x) have « < n + 1 points of increase. Then the 
integral representing <!> reduces to a finite sum 

<!> — Plito + + • • • + + P2(tQ + tl^2 + • • • + + 

+ • • • + P,ito + + • • • + 

denoting by pi, p 2 , • • • p* masses concentrated in the s points of 

increase fi, { 2 , . . . Now, since s ^ n constants ^o, < 1 , . . . tn, not 
all zero, can be determined by the system of equations 

to + tl^l + • * * + tn^i = 0 

^0 + + • * * + tn^2 = 0 


to + ti^8 -|- . . . -j- 


Thus 4) vanishes when not all variables vanish; hence, its determinant 
An = 0, contrary to hypothesis. 

From now on we shall assume that (p{x) has at least n + 1 points of 
increase. The integral 



can be expanded into a formal power series of 1/z, thus 


aZ - X Z Z^ Z^ 


m2n 


+ 


and this power series can be converted into a continued fraction as 
explained in Sec. 2. Let 

Pi P 2 Pn Pn+1 

Ql Q 2 ’ ’ * ‘ QrJ On+1 

be the first n + 1 convergents to that continued fraction. I say that the 
degrees of their denominators are, respectively, 1, 2, 3, . . . n + 1. 
Since these degrees form an increasing sequence, it suffices to show that 
there exists a convergent with the denominator of a given degree 

a g n + 1. 

This convergent P/Q is completely determined by the condition that in a 
formal expansion of the difference 
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rd<p{x) P 
Ja Z ~~ X Q 

into a power series of 1/z, terms involving I/ 2 , 1/z^, . . . 1/z^* are 
absent. This is the same as to say that in the expansion of 

Q(z) - Piz) 

Ja ^ ^ 

there are no terms involving 1 72 , 1. . . I/ 2 *. The preceding expres¬ 
sion can be written thus: 


'Qix)dvix) £ 


Z — X 


Q{z)^ _ - P(e) = ^ + 


Since 


f 


'Q(z) - Q{x) 


d(p{x) — P{z) 


z — X 

is a polynomial in z^ it must vanish identically. That gives 

(4) p(z) = r 9i^i- - f^^ d<p(x). 

Ja Z — X 

To determine Q{z) we must express the conditions that in the expansion of 


X' 


Q{x)d^{x) 
z — X 


terms in I/ 2 , \/z^, ... 1/2* vanish. These conditions are equivalent to 
s relations 

(5) ^^Q{x)dip{x) = 0, ^\Q{x)d<p{x) = 0 , • • • x^~^Q(x)d<p(x) = 0 , 

which in turn amount to the single requirement that 


( 6 ) d(x)Q{x)d<p(x) — 0 

for an arbitrary polynomial ^(x) of degree ^ s — 1 . 

Conversely, if there exists a polynomial Q{z) of degree s satisfying con¬ 
ditions ( 6 ), and P{z) is determined by equation (4), then P{z)/Q(z) is a 
convergent whose denominator is of degree s. For then the expansion of 

r d.p(x) Pjz) 

Ja Z - X Q(Z) 

lacks the terms in 1/z, I/?*, . . . 1/z^. 
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Let 


Q(2J) — Iq l\Z -f- I 9 — 1 Z* ^ + 2*. 

Then equations (5) become 


mok + WiZi + m^h + • • • + + m, = 0 

Ttlllo "f" “h ^3^2 '■f' * * • “f" fTla-^l ~ 6 


'i^'8-llo + TTlsh + 'f^a+xh + • * • + m2a~2la-\ + 1^2a-l = 0 . 

This system of linear equations determines completely the coefficients 
Zo, Zi, . . . la-\ since its determinant A,_i > 0. 

The existence of a convergent with the denominator of degree 

s g n + 1 

being established, it follows that the denominator of the sth convergent 
Pa/Qa is exactly of degree s. The denominator is determined, except 
for a constant factor, and can be presented in the form: 


1 2 ^2 . 

• 2* 

mo mim2 

• m. 

mi mtmz • • 

• m«+i 

m«_lm,m|,^_l • • 

* m2,_i. 


A remarkable result follows from equation (6) by taking Q = Q, and 
0 = namely, 

(7) d(p{x) =0 if s 9 ^ s' 

while 

J^Qld<p(x) >0 (s ^ n). 

In the general relation 

Qa ~ QaQa—l Q*—2 


the poljmomial g, must be of the first degree 

g, = aaZ + Pay 

which shows that the continued fraction associated with 

^^dipjx) 


X 


z ^ X 
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has the form 


1 

OL\Z + /3l — 


1 

a^z + ^2 ~ 


1 

azZ +03 — 


The next question is, how to determine the constants a, and 0 ,. Multi¬ 
plying both members of the equation 

Q, = {oi,z + 0.)Q.-i - Q«_2 {s ^ 2) 

by Qg- 2 d(p{z)j integrating between limits a and 6 , and taking into account 
(7), we get 

0 = cxgJ^zQg-iQg-2d<p{z) - J^Q^^2d<p{z). 

On the other hand, the highest terms in and 0,_2 are 
aiaz * • • cniaz • • • 

Hence, 

zQt-2 = ^ 


where ^ is a polynomial of degree — 2. Referring to equation ( 6 ), 
we have 


X' 


zQa—zQ 8—ld(p(^z) 


Ctg-ija 


Q^id<p(z) 


and consequently 

( 8 ) 




Suppose that the following moments are given: mo, mi, . . . m 2 n; how 
many of the coefficients a, can be found? Evidently ai = 1/mo. Fur¬ 
thermore, Oo = 1 and Qi is completely determined given mo and mi. 
Relation ( 8 ) determines azy and Qz will be completely determined given 
mo, mi, m 2 , m 3 . The same relation again determines as, and Qa will be 
determined given mo, mi, . . . ms. Proceeding in the same way, we 
conclude that, given mo, mi, m 2 , . . . mzn, all the polynomials 


as well as constants 


Qo, Qlj Q2, . . . Qn 


otiy azy as, . . . an+i 
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can be determined. It is important to note that all these constants are 
positive. 

Proceeding in a similar manner, the following expression can be found 




It follows that constants 


£zQ^._,d<p{z) 

as— - 


^ 1 , ^ 2 , . . . Pn 

are determined by our data, but not /3n+i. For if s = w + 1, the integral 

£zQld^{z) 

can be expressed as a linear function of Wo, mi, . . . m^n+i with known 
coefficients. But m^n+i is not included among our data; hence, /8n+i 
cannot be determined. 

4. Properties of Polynomials Q,. Theorem. Roots of the equation 
Qs{z) =0 {s ^ n) 

are real, simple^ and contained within the interval (a, b). 

Proof. Let Qs(z) change its sign r < s times when z passes through 
points Zi, Z 2 , . . . Zr contained strictly within (a, 6). Setting 

d{z) = (2 — Zi){z — Zo) ' ' • (z — Zr) 

the product 

d{z)Qs{z) 

does not change its sign when z increases from a to b. However, 

£0(.z)Q.iz)d<p{z) = 0, 

and this necessitates that 

e(z)Qsiz) 

or Qs{^) vanishes in all points of increase of (p(z). But this is impossible, 
since by hypothesis there are at least n + 1 points of increase, whereas 
the degree s of Q, does not exceed n. Consequently, Qa(z) changes its 
sign in the interval (a, b) exactly s times and has all its roots real, simple, 
and located within (a, b). 

It follows from this theorem that the convergent 


P: 
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can be resolved into a sum of simple fractions as follows: 


(9) 


Pn(z) ^ Ai Ai I ... 1 A„ 

Qn{z) z — Zi z — e2 z — z„ 


where Zi, Zi, . . . Zn are roots of the equation Q„(z) = 0 and in general 


^ Pn(Zt 


The right member of (9) can be expanded into power series of 1/z, the 
coefficient of 1/z* being 


^ AaZi-^. 


By the property of convergents we must have the following equations: 


n 

= TUo 

a = l 
n 

^AaZa = mi 
a«al 


= rriin-i. 

0 = 1 

These equations can be condensed into one, 


( 10 ) 


X^»r(z«) = JV(z)d^(z) 


which should hold for any polynomial T{z) of degree ^2n — 1. 
Let us take for T{z) a polynomial of degree 2n — 2: 


T(z) = Qni^) 

l{z-Zam{Za)\ 


Then 


Tiza) = 1, Tizff) =0 if P9^a 
and consequently, by virtue of equation (10), 

Thus constants ill, 2 , . . . An are all positive, which shows that P„( 2 :fc) 
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has the same sign as Qni'Zk)^ Now in the sequence 

Qife), . . . Q'AZn) 

any two consecutive terms are of opposite signs. The same being true of 
the sequence 

Pn(2l), Pnfe), . . . Pn(2n), 

it follows that the roots of Pn{z) are all simple, real, and located in the 
intervals 


(Zly Z2); (Z 2 , 23 ); . . . {Zn-l, Zn). 

Finally, we shall prove the following theorem: 
Theorem. For any real x 

Qn{x)Qn-i(x) - Qn^iix)Q,,(x) 

is a positive number. 

Proof. From the relations 

Q„{z) = {asZ + fis)Qs-i{z) - Q,~ 2 (z) 
Q,(x) = (asX + l3s)Qe-l(x) ~ Qs~- 2 (x) 

it follows that 


Qs(z)Qs-i(x) - Q s (x)Qs-i(z) 
z — X 


= asQ,^i(z)Q,-i(x) + 

, Qs-.i(z)Qs~2(x) - Q,-i(x)Qs^2(z) 
z — x 


whence, taking s = 1, 2, 3, . . . n and adding results. 


Z — X 

« = 1 

It suffices now to take z = x to arrive at the identity 

n 

Q:ix)Q„.,{x) - Q'„_^{x)Qn(x) = 

8 = 1 

Since Qo = 1 and a, > 0, it is evident that 

Qn(^)Qn-l(^) Qn--l(p^)Qn{x) > 0 

for every real x. 

6. Equivalent Point Distributions. If the whole mass can be con¬ 
centrated in a finite number of points so as to produce the same I first 
moments as a given distribution, we have an “equivalent point distribu- 
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tion^^ in respect to the I first moments. In what follows we shall suppose 
that the whole mass is spread over an infinite interval — oo, oo and that 
the given moments, originating in a distribution with at least n + 1 
points of increase, are 

mo, mi, m 2 , . . . m2». 

The question is: Is it possible to find an equivalent point distribution 
where the whole mass is concentrated in n + 1 points? Let the unknown 
points be 

ih 62 , . . . ^n+l 

and the masses concentrated in them 


Aiy A 2 , . . . An 4 - 1 . 


Evidently the question will be answered in the affirmative if the system 
of 2n + 1 equations 


(A) 


n + l 

^Aa = nio 

ti -f -1 

Aa^a ~ mi 

a«l 
n + l 

2) = m 2 

a=-l 


n + 1 

Aail" = rrhn 

a«l 

can be satisfied by real numbers ^i, { 2 , .. • • fn+i; A], A 2 , . . . An+i, 
the last n + 1 numbers being positive. The number of unknowns being 
greater by one unit than the number of equations, we can introduce the 
additional requirement that one of the numbers { 2 , . . . Jn+i should 
be equal to a given real number v. The system (A) may be replaced by 
the single requirement that the equation 

n-fl 

(11) XAam.) = f‘^T(x)d<p(x) 

a “> 1 

shall hold for any polynomial T{x) of degree g2n. Let Q{x) be the 
polynomial of degree n + 1 having roots fi, { 2 , . . . {n+i and let 0{x) be 
an arbitrary polynomial of degree gn — 1. Then we can apply equation 
(11) to 


T{x) = eix)Q{x), 
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Since Q({a) = 0, we shall have 

(12) j"j{x)Q{x)d,p{x) 

for an arbitrary polynomial 6{x) of degree — 1. Presently we shall 
see that requirement (12) together with Q{v) = 0 determines Q{x)^ save 
for a constant factor if 

Qv(v) 9^ 0. 

Dividing Q{x) by Qn{x)y we have identically 

Q{x) = (Xor + ^j)Qn{x) + Rn-\(x) 

where Rn-i{x) is a polynomial of degree — 1. If 6{x) is an arbi¬ 
trary polynomial of degree ^ n — 2, 

(\x + fj)6{x) 

will be of degree ^ n — 1. Hence 

Jl^O^x + fi)S(x)Qn{x)d(p{x) = 0 
by (6), and (12) shows that 

J^'6(x)Rn-i(x)d(p(x) = 0 

for an arbitrary polynomial 6(x) of degree — 2. The last require¬ 
ment shows that Rn-i(x) differs from Qn-^ix) by a constant factor. Since 
the highest coefficient in Q(x) is arbitrary, we can set 

Rn-^{x) = -Qn-i(a^). 

In the equation 

Q{x) = (Kx + ^)Qn{x) - Qn-i(a?) 

it remains to determine constants X and n. Multiplying both members by 
Qn-i{x)d(p{x) and integrating between — oo and we get 

^xQn-.iQnd<p(x) = f^^QLid<p(x) 

or 

A r 

anj-oo J~co 

f_\Qndv(x) 


But 
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whence 


X ^n-j-1 • 


The equation 

0 = Q{v) = (a!n+ir + fJL)Qn{p) ~ Qn-\{v) 
serves to determine /x if Qn{v) 9^ 0. The final expression of Q{x) will be 


Q(x) = - t;) + - Q„_i(x). 

Owing to recurrence relations 


O2 — {oLiX + Pi)Ql — 
it is evident that 


Qo‘y Q’i — {ocsx + Ps)Q 2 — Qvy • • • 

Qn = (anX + fin)Qn-l 


Qy Qny Qn-^ly . . . Ql, Qo = 1 


Q 


- 2 , 


in a Sturm series, h'or x — — oo^ it contains n + 1 variations and for 
X = 00 only permanences. It follows that the equation 

Qix) = 0 


has exactly n + 1 distinct real roots and among them v. Thus, if the 
problem is solvable, the numbers fi, ^2, . . . $n+i are determined as 
roots of 

Q(x) = 0. 

Furthermore, all unknowns Aa will be positive. In fact, from equation 
(11) it follows that 

Now we must show that constants Aa can actually be determined so as 
to satisfy equations (^ 4 ). To this end lot 

P(x) = - t-) +^^]Pn(x) -P„_l(x). 

Then 

and, on account of (12), the expansion of the right member into power 
series of 1 /x lacks the terms in 1/x, l/x^, . . . l/x”. Hence, the expan¬ 
sion of 

n d^(z) P(x) 

J-ooX-z Q{x) 
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lacks the terms in 1/x, 1/x^, . . . that is, 


Pjx) _ mo mi 
Q(x) X x^ 


I mon , 
T” j.ln+1 


On the other hand, resolving in simple fractions, 
P(x) Ai , Ao 


Qix) X - 


+ 


£2 


+ 


+ 


L n+1 


^n+l 


Expanding the right member into power series of 1/x and comparing 
with the preceding expansion, we obtain the system (A). By the previous 
remark all constants Aa are positive. Thus, there exists a point distribu¬ 
tion in which masses concentrated in n + 1 points produce moments 
mo, mi, . . . m 2 n- One of these points v may be taken arbitrarily, with 
the condition 


Qn{v) ^ 0 

being observed, however. 

6. Tshebysheflf’s Inequalities. In a note referred to in the introduc¬ 
tion Tshebysheff made known certain inequalities of the utmost impor¬ 
tance for the theory we are concerned with. The first very ingenious 
proof of them was given by Markoff in 1884 and, by a remarkable 
coincidence, the same proof was rediscovered almost at the same time 
by Stieltjes. A few years later, Sticltjes found another totally different 
proof; and it is this second proof that we shall follow. 

Let (p{x) be a distribution function of a mass spread over the interval 
— 00 , 00 . Supposing that a moment of the order 

J* x^d<p{x) — rrii, 

exists, we shall show first that 

lim ^(mo — — 0 

lim — = 0 

when I tends to + ^, For 

J“x'd<p(x) ^ l^£'"d<f(x) = f'[ip(+'») - <p{l)] 

or 

V{mo — <p(0) ^ f^“‘x'd>p{x). 

Similarly 

|J_”^x‘Mx)| s = iv(-i) 
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or 


Now both integrals 

x'd(p(x) and J ^ xMip(x) 

converge to 0 as Z tends to + oo ; whence both statements follow immedi¬ 
ately. Integrating by parts, we have 

J^x^d(p{x) = — mo] — ~ 'f^ioW~^dx 

^^x^d<p{x) — ( —l)*“^Zv(~Z) — ij^x^~^ip{x)dx, 

whence, letting Z converge to + ^, 

rrii = ~ mo]x^~''^dx — ij^^x'-^(p{x)dx. 

If the same mass mo, with the same moment mt, is spread according to 
the law characterized by the function ^(x), we shall have 

m = “ ino]x^-^dx — ^x^~h{/(x)dx, 

whence 

(13) ^x^~^[fp{x) — \p{x)]dx = 0. 

Suppose the moments 


rrioy mi, m 2 , . . . m2n 

of the distribution characterized by (p(x) are known. Provided <p(x) 
has at least n + 1 points of increase, there exists an equivalent point 
distribution, defined in Sec. 5 and characterized by the step function 
4^(x) which can be defined as follows: 



^(x) = 0 

for 

— 00 < X < fi 


^(x) = Ai 

for 

^ X < i 2 

}l/{x) 

= Ai + A2 

for 

« 

V 

VII 

^{x) = ill + -^2 + 

• * * + An 

for 


^(x) == ill + A2 + * 

* * + An+] 

for 

g X < + 00 


provided roots ti, fs, . . . fn+i of the equation Q(x) == 0 are arranged 
in an increasing order of magnitude. 
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Equation (13) will hold for f = 1, 2, 3, . . . 2n or, which is the 
same, the equation 

(14) J_ ^e{x)[ip(x) — 4/{x)]dx = 0 

will hold for an arbitrary polynomial B{x) of degree g2n — 1. The 
function 


— ypix) 

in general has ordinary discontinuities. We can prove now that h{x)y if 
not identically equal to 0 at all points of continuity, changes its sign at 
least 2n times. ^ Suppose, on the contrary, that it changes sign r < 2n 
times; namely, at the points 


fll, 02, . . . Or. 

Taking 

e{x) = (x - ai){x — a2) • • • (x - Or), 
equation (14) will be satisfied, while the integrand 

e{x)h{x), 

if not 0, will be of the same sign, for example, positive. Let f be any 
point of continuity of h{x). If ^ = Oi (t = 1, 2, . . . r) then h{ai) = 0 
since h{x) changes sign at a^. If J does not coincide with any one of the 
numbers ai, a 2 , . . . Ur, then for an arbitrarily small positive c we must 
have 


^^^'^Q{x)h{x)dx — 0 . 


But by continuity 

d{x)h{x) 

remains in the interval (J — €, ^ + «) for sufficiently small e above a 
certain positive number unless h(^) — 0. Thus, if h{x) does not vanish 
at all points of continuity (in which case (p(x) and \l/{x) do not differ 
essentially), it must change sign at least 2n times. Let us see now where 
the change of sign can occur. In the intervals 

— CO, f 1 and fn+i, + 00 

function f(x) is said to change sign once in (o, b) if in this interval there 
exists a point or points c such that, for instance, f{x) ^ 0 in (o, c) and f(z) g 0 in 
(c, 6), equality signs not holding throughout the respective intervals. The change 
of sign occurs n times if (a, h) can be divided in n intervals in which f(x) changes 
sign once. 
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<p{x) — f (x) evidently cannot change sign. Within each of the intervals 


fi-i, ii 

there can be at most one change of sign, since ^|/{x) remains constant 
there, and ip{x) can only increase. The sign may change also at the 
points of discontinuity of ^{x)] that is, at the points Ji, ^ 2 , • • . ?n+i. 
Altogether, ip{x) — yf/{x) cannot change vsign more than 2n + 1 times 
and not less than 2n times. 

Since \p{x) = 0 so far as x < and <^(fi — e) is not negative for 
positive €, we must have 

- e) - - 6 ) ^ 0 . 

Also \l/(x) = mo for x > Jn+i and <p(x) ^ mo, so that 
^(fn4-l +e) - +0^0. 

At first let us suppose 

<^(fi ~ €) — — e) >0, <p{^n+\ + f) — + e) <0. 

In this case <p(x) — \p(x) must change sign an odd number of times; that is, 
not less than 2n + 1 times. Since this cannot happen more than 2n + 1 
times, the number of times <p(x) — \l/(x) changes its sign must be exactly 
2n + 1- These changes occur once within each interval 

n ft 

and in each of the points fi, ^ 2 , . . . fn+i. When the change of sign 
occurs in the interval ^t) where remains constant, because (p{x) 
never decreases, we must have for sufficiently small e 

(15) <p{^i - 6) - V'Cfi - 6) > 0. 

But the sign changes in passing the point therefore, 

(16) <p{ii + 6) - V'(fi + e) <0. 

The equalities 

^(fi — e) — — e) = 0, V?(fn+1 + €) — ^(fn+l + €) = 0 

cannot both hold for all sufficiently small e. For then there would not 
be a change of sign at and so that the number of changes would 
not be greater than 2n — 1 which is impossible. Therefore, let 

<pUi — €) — ^(fi — €) = 0 and <p{in+i + €) - Hin+I + «) < 0. 

Then there will be exactly 2n changes of sign: one in each of the intervals 
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and in each of the points { 2 , {s, . . . fn+i. The inequalities (15) and 
(16) would hold for i ^ 2, but 

— €) — — e) = 0, <p(^i + €) — ^(£1 + e) < 0 

for all sufficiently small e. 

Now let 

<^(£n 4 -i + 0 “ ^(fn +1 + e) = 0 and (^(£1 —• e) — ^(£1 — e) > 0 

for all sufficiently small positive e. Then there will be exactly 2n changes 
of sign: In each of the points £ 1 , £ 2 , . . . £« and in each of the n intervals 

£t— 1 , £t. 

The inequalities (15) and (16) will again hold for ^ g n, but 

<^(fn+i — e) — ^(£„ 4 .i — e) > 0 and <^(£„+i + 0 ~ ^(fn+i + c) = 0 

for all sufficiently small e. Letting e converge to 0, we shall have 

^(fi - 0) S ^(£, - 0) 

V^(fi + 0) ^ ^(£i + 0) 

for f = 1, 2, 3, . . . n + 1 in all cases. Then, since 

^ <p{^i - 0); ^(£,) g ^(£, + 0), 

we shall have also 


v?(fi) S — 0) 

^ + 0 ) 


or, taking into consideration the definition of the function ^(x) 






EM 

Q'ib) 


v’(Si) g 2 

i=i 


pib) 

Q’^i) 


These are the inequalities to which TshebyshefT’s name is justly 
attached. For a particular root ii = v they can be written thus: 


<piv) ^ 2 

<V 


Pj^t) 


<p{v) g 2 


EM 




( 17 ) 
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with the evident meaning of the extent of summations. Another, less 
explicit, form of the same inequalities is 

/.ON ^ 4'iv - 0) 

^ ^ <p{v) ^ Hv + 0 ). 

As to Pix) and Q{x), they can be taken in the form: 

P(x) = [an+i(a: — v)Qn{v) + Q„_)(r)]P„(x) - Q„(r)P„_i(a;) 

Q{x) - [Q:„+i(a; — v)Qniv) + Q„_i(j>)]Q„( 3;) — Q„(r)Q„_i(x). 

Thus far we have assumed that v was different from any root of the 
equation 

Qn(x) = 0, 

but all the results hold, even if 

Qn{v) = 0. 

To prove this, we note first that when a variable v approaches a root ^ of 
0„(x), one root of Q{x) (either or i„+i) tends to — oo or + », while the 
remaining n roots approach the n roots xi, xj, . . . Xn of the equation 

Q„(x) = 0. 

If tends to negative infinity, it is easy to see that 

P(fi) 

tends to 0. In this case the other quotients 

P(f.) 

Q'(b) 

tend respectively to 

P™(Xi) P„(Xj) 

Q’nix^y ■ ■ ■ • 


If tends to positive infinity the quotients 


. 1 _ 




; I = 1,2, ... n 


approach respectively 


PniXl) , 


Q'.ixi) 


; Z — 1, 2, 8, . . . n. 


while 


Piin+l) 
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tends to 0. Now take » = { — e and » = { + « in (17) and let the posi¬ 
tive number e converge to 0. Taking into account the preceding remarks, 
we find in the limit 


whence again 


- 0 ) 


*i<£ 


+ 0 ) 







Qn(Xl) 


xi<i 


«»(«^ 2 


Pn{Xl) 

Q'niXl) 


xi^i 


But these inequalities follow directly from (17) by taking v = f. 
Since 


+ 0) - ^(t; - 0) = ^ 

it follows from inequalities (18) that 

0 g ^{v) - - 0) ^ 

On the other hand, one easily finds that 

Piv) ^ _1_ 

Q'{v) + Q'n(v)Qn-l(v) - QLl(v)Qn(vy 

But referring to the end of Sec. 4, 


Q:(v)Qn-i(v) - QLi(f)Q<>(f) = S 

s -1 


whence 

a»+lQ»(»)* + QnMQn-l(t') - Q'n-lWQM = Qn+l(«')Q«(«') - Qn(t>)Qn+l(l>). 

Finally, 

0 ^ v(») - ^(v -0)g 1 Q;(r)Q„+i(tr)- 

If ^i(v) is another distribution function with the same moments 

t/lOf 'tTlly • • • Wl2»> 
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we shall have also 


0 g <px{v) ^(t> 0) ^ _ Q'^{v)Q.^x{v)’ 

and as a consequence, 

(19) \^l{v) - ip(v)\ g Xn(v) 

—a very important inequality. Here for brevity we use the notation 

" Qn+MQniv) - Ql{v)Q„^x{vy 

7. Application to Normal Distribution. An important particular 
case is that of a normal distribution characterized by the function 


ip(x) 



In this case it is easy to give an explicit expression of the polynomials 
Qn{x), Let 


Integrating by parts, one can prove that for Z ^ n — 1 

J_ ^e~^^x^Hn(x)dx = 0 . 

Hence, one may conclude that Qn{x) differs from Hn{x) by a constant 
factor. Let 

Qn(x) = CnHnix). 

To determine Cn, we may use the relation 

Hn(x) = - 2 xHn-l(x) - 2(n - l)Hn-2{x) 

which can readily be established. Introducing polynomials Qn, this 
relation becomes 
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whence Ci = — 3^. The knowledge of co and Ci together with the relation 

- _ c „_2 

allows determination of all members of the sequence C2, C3, C4, . . . . 
The final expressions are as follows; 

1 

Ca- 2” • 1 • 3 • 5 • • ■ (2m - 1) 

-1 

C2m+1 - 2„+i ■ 2 • 4 ■ 6 • • •' 2m' 

From the above relation between Hn~ 2 {x) and owing to 

the fact that Unix) is an even or odd polynomial, according as n is even or 
odd, one finds 

= (-2)- • 1 • 3 • 5 • • • (2m - 1), 

while another relation 

H'M == -2nHn-iix), 

following from the definition of Hn{x)^ gives 

i^2m-i(0) = (-2)--1-3-5 • • • (2m- 1). 

These preliminaries being established, we shall prove now that 

attains its maximum for v = 0. Let 

m = HUMHniv) - 

Then, taking into account the differential equation for pwlynomials 
Hn(.v): 

= 2vH'„(,v) - 2nHn{v) 

we find that 

^ = 2vQ- 2Hn{v)H.+i(v). 

On the other hand, 

Q _ TJ Hnip) 

and denoting roots of the polynomial Hn+i{v) in general by 

d Hn(v) _ HniO 1 

dvHn+,{v) {V - 0^' 
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Consequently 






1 


(t- - {)* 


Again 


and so 
dQ 






Hnii) 




W'S’_! 

1 ^(v - 


dv (» - f)* n + 1 - {)* 

Roots of the polynomial Hn~\-i(x) being symmetrically located with 
respect to 0, we have: 

2rj; - fv “2or+ir^ 


and finally 


dQ 


- -^^’2 


l(«;^ - er’ 




dv n + 1 


Hence 


dQ _ 

*>» 


if 


V < 0; 


dU . 

TT < 0 if 
dv 


V > 0 


that is, 0(r) attains its maximum for t» = 0 and XnCt)) attains its maximum 
for i; = 0. Referring to the above expressions of Cjm, Cim+i', 
^*m+i(0)> we find that 

2 • 4 • 6 • • • 2rw 


X2m(0) = 

Xs».+i(0) = 


3-6,-7 • • • (2w + l) 
2•4•6 • • • 2m 


3 • 6 • 7 • • • (2m + 1) 
In Appendix I, page 354, we find the inequality 


2 • 4 • 6 


2m 


whence 


_ 1 ^ Vt 

1 • 3 • 5 • • • (2m — 1) 'v/4m + 2 2 

2-4-6 • 


3-5-7 


• • 2m 
(2m + 1) 


I ^ \ 

■\4m + 2 


Thus, in all cases 


Xn(v) ^ X»(0) < 
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whence, by virtue of inequality (19), 

k-W - »WI < 

Thus any distribution function <(>i{v) with the moments 

1 n 1 • 3 • 5 • • • (2fc - 1) n ^ 

mo = 1, niu-i = 0, rriik =--- (fc g n) 


corresponding to 


<p(v) = 



e~^^du 


differs from <p{v) by less than 



Since this quantity tends to 0 when n increases indefinitely, we have the 
following theorem proved for the first time by Tshebysheff: 

The system of infinitely many equations 



dip{x) = 1; 


^ x^^~^^d<p{x) — 0 ; 

fc = 1, 2, 3, . 



x^^d<p{x) 


1 • 3 • 5 


• • • (2fc - 1) 

2^ 


uniquely determines a never decreasing function (p{x) such that <p{ — oo) 
namely. 


ip{x) = 



er^^du. 


0 ; 


8. Tshebysheff-Markoff’s Fundamental Theorem. When a mass = 1 
is distributed according to the law characterized by a function F{x, X) 
depending upon a parameter X, we say that the distribution is variable. 
Notwithstanding the variability of distribution, it may happen that its 
moments remain constant. If they are equal to moments of normal 
distribution with density 

y/ic 


then by the preceding theorem we have rigorously 

f(i, X) = f* e-“‘dw 


no matter what X is. 
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Generally moments of a variable distribution are themselves variable. 
Suppose that each one of them, when X tends to a certain limit (for 
instance oo), tends to the corresponding moment of normal distribution. 
One can foresee that under such circumstances F{x, X) will tend to 

1 

(p{x) = —7= I e~^^du, 

VttJ- 00 

In fact, the following fundamental theorem holds: 

Fundamental Theorem. //, /or* a variable distribution characterized 
by the function F(x, X), 

lim I x^dF(Xj X) = —^ C cr-^^'x^dx] X —^ 00 

J-00 V 7 rJ-« 

for any fixed A; = 0, 1, 2, 3, . . . , then 

lim F{v, X) = r e~^''dx; X — > « 

V^J- CO 

uniformly in v. 

Proof. Let 


mo, mi, m2, . . . m2n 

be 2n + 1 moments corresponding to a normal distribution. They 
allow formation of the polynomials 

Qo(x), Qiix), . . . Qn{x) and Q{x) 

and the function designated in Sec. 6 by ^{x). Similar entities cor¬ 
responding to the variable distribution will be specified by an asterisk. 
Since 


mj —> mib as X —> 00 

and since An > 0 , we shall have 

a:>o 

for suflSciently large X. Then F(x, X) will have not less than n + 1 
points of increase and the whole theory can be applied to variable dis¬ 
tribution. In particular, we shall have 

0 g <p{v) - - 0) g xn{v) 


( 20 ) 

0 ^ F{v, X) - rip - 0) ^ xliv). 

Now QXix)is = 0, 1, 2, . . . n) and Q*(x) depend rationally upon 
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mX{k= 0, 1, 2, . . . 2n); hence, without any difficulty one can see that 


Q*{x) -> Q.(x); s = 0, 1, 2, . . . n 
Q^ix) Q{x) 


as X —^ 00 ; whence, 
Again 


Xn W —^ Xn(v). 

- 0 ) -> ^P(v - 0 ) 


avS X —> 00. A few explanations are necessary to prove this. At first let 
Qn{v) 9^ 0. Then the polynomial (3(x) will have n + 1 roots 


< ^2 < < * • • < J.+i. 

Since the roots of an algebraic equation vary continuously with its 
coefficients, it is evident that for sufficiently large X the equation 

Q*{x) = 0 

will have n + 1 roots: 


< ^2 < $*3 < • • • < 

and $ * will tend to as X oo. In this case, it is evident that \k*(v — 0) 
will tend to — 0). If Qn{v) = 0, it may happen that or tends 
respectively to — oo or + oo as X —> oo, while the other roots tend to the 
roots 


of the equation 


Xr\j X2j • • • Xfl 

Qn(x) = 0 . 


But the terms in r//*(v — 0) corresponding to infinitely increasing roots 
tend to 0, and again 

\k*(v — 0 ) —> ^(v — 0 ). 


Now 



Consequently, given an arbitrary positive number e, we can select n so 
large as to have 
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Having selected n in this manner, we shall keep it fixed. Then by the 
preceding remarks a number L can be found so that 

x'W < a/1 < ‘ 

0 ) — — 0)1 < € 

for X > L. Combining this with inequalities (20), we find 

X) - <^(«^)| < 3€ 

for X > L. And this proves the convergence of F{v, X) to <p{v) for a 
fixed arbitrary v. To show that the equation 

1 n 

lim F{Vj X) = J e-'-^'^dx 

holds uniformly for a variable v we can follow a very simple reasoning due 
to P6lya. Since <^(—<») =0, <^(+«>) =1 and ip{x) is an increasing 
function, one can determine two numbers uo and Un so that 


<p(x) S <p{ao) < 1 

o 

•-i 

HA 

1 - <pix) ^ 1 - <p(an) < ^ 

t: 

e 

AH 


Next, because ip{x) is a continuous function, the interval (oo, an) can be 
subdivided into partial intervals by inserting between Oo and points 
CL\ a2 * * * '^ a^—1 so that 

0 < ip{ak-^x) — (p{a]c) < “ 

for A; = 0, 1, 2, . . . n — By the preceding result, for all sufficiently 
large X 

/^(oo, X) < i; 1 ~ F(an, X) < ^ 

and 


|F(ofc, X) — <p{ak)\ < |: fc = 1, 2, . . . n - 1. 

Now consider the interval (— «, oo). Here for d g Oo 
0 ^ F(v, X)< 0 < ^(v) < i 

and 

|F(», X) - v>(»)| < «• 
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For t) belonging to the interval (a„, + «) 

0 g 1 - F{v, X) < |, 0 < 1 - vip) < 


whence again 


Finally, let 


\F{v, X) — ip{v)\ < €. 


Then 


ak ^ V < afc+i (k = 0, 1, 2, . 


.71—1). 


F(v, X) - <p{v) ^ F{aky X) - <p(ak+i) = 

= [F(aky X) — (pidk)] + — <p((^k+i)] 

F{v, X) - ^(?;) g F(ajk+i, X) - <^(0^) = 

= [F(a/c4-i, X) — <p{ak+\)] + — ^(a/;)]. 


But 


whence 


F{aky X) — (p{ak) > ” 
F(afc+i, X) — <p{ak+i^ < 2^ 

— € < F(r, X) 


(p(ak+\) — ^(a^) < 
^{v) < 6. 


Thus, given e, there exists a number Lit) depending upon t alone and 
such that 

|F(i;, X) - ipiv)\ < t 

for X > L(e) no matter what value is attributed to v. 

The fundamental theorem with reference to probability can be stated 
as follows: 

Let Sn be a stochastic variable depending upon a variable positive integer 
n. If the mathematical expectation Eis^) for any fixed A; = 1, 2, 3, . . . 
tendsy as n increases indefinitelyy to the corresponding expectation 


Eix^) — —^ r x^e~^*dx 

VttJ- CO 

of a normally distributed variabUy then the probability of the inequality 


tends to the limit 


Sn < V 



e-^'dx 


and that uniformly in v. 
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In very many cases it is much easier to make sure that the conditions 
of this theorem are fulfilled and then, in one stroke, to pass to the limit 
theorem for probability, than to attack the problem directly. 


Application to Sums of Independent Variables 

9. Let Zi, Z 2 , Zzj . . . be independent variables whose number can be 
increased indefinitely. Without losing anything in generality, we may 
suppose from the beginning 

E{zk) =0; fc = 1, 2, 3, . . . . 

We assume the existence of 

E{zD = h, 

for all fc = 1, 2, 3, . . . . Also, we assume for some positive d the 
existence of absolute moments 


E\z,\^ 




/b = 1, 2, 3, 


Liapounoff^s theorem, with which we dealt at length in Chap. XIV, 
states that the probability of the inequality 


where 


Zl + Z2 + ^ + Zn 

VWn 


< t , 


B „ = i>l + 62 + ■ ■ ’ "H 6n 


tends uniformly to the limit 


as n 00 , provided 



er^'dx 




Liapounoff^s result in regard to generality of conditions surpassed by 
far what had been established before by Tshebyshcff and Markoff, whose 
proofs were based on the fundamental result derived in the preceding sec¬ 
tion. Since Liapounoff^s conditions do not require the existence of 
moments in an infinite number, it seemed that the method of moments 
was not powerful enough to establish the limit theorem in such a general 
form. Nevertheless, by resorting to an ingenious artifice, of which we 
made use in Chap. X, Sec. 8, Markoff finally succeeded in proving the 
limit theorem by the method of moments to the same degree of generality 
as did Liapounoff. 
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Markoff's artifice consists in associating with the variable 2 * two new 
variables Xk and yk defined as follows: 

Let AT be a positive number which in the course of proof will be 
selected so as to tend to infinity together with n. Then 

Xk = Zk, t/fc = 0 if \zk\ g N 

Xk =0, 2/* = Zk if \zk\ > N. 

Evidently Zk, Xk, yk are connected by the relation 

Zk = Xk + yk 

whence 

( 21 ) E{xk) + Eivk) = 0 . 

Moreover 

E(xi) + E(yl) = E{zl) = hk 

(22) 

E\xk\^^^ + E\7jk\^^^ = E\zk\^^^ = 

as one can see immediately from the definition of Xk and yk- 
Since Xk is bounded, mathematical expectations 

E{xl) 

exist for all integer exponents Z = 1,2,3, . . . and for ^ = 1, 2, 3, . . . . 
In the following we shall use the notations 

\E(xi)\ = / = 1, 2, 3, . . . 

cf + cf + • • • + 

^( 2 + 5 ) _(_ ^( 2 + 5 ) ^( 2 + 5 ) Cn- 

Not to obscure the essential steps of the reasoning we shall first 
establish a few preliminary results. 

Lemma 1. Let qk represent the prohability that yk 0] then 

qi + g 2 + ' ' ’ + Qn ^ 

Proof. Let <pk(x) be the distribution function of Zk- Since yk 9^ 0 
only if \zk\ > iV, the probability qk is not greater than 

On the other hand, 
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But 

whence 


Qk ^ 


/ -N 

d(pk{x) + 



d<pk{x) 




The inequality to be proved follows immediately. 
Lemma 2. The following inequality holds: 


i>^>i 
- 5„ = 


C„ 

Bjf‘' 


Proof. From 




which is a consequence of the second equation (22) it follows that 


EiyD ^ 


m ■ 


The first equation (22) 

cf' + E{y^) = h 


gives 


Taking the sum for k 


whence 


bk g ^bk- 


"A"* 


1, 2, 3, 


n, we get 


Bn 


Cn 

N‘’ 


>? 1>1 
= Bn = 


Cn 

BnN>‘ 


Lemma 3. For e ^ 3, 



Proof. This inequality follows immediately from the evident 
inequalities 


<4*' ^ E\xk\‘ g N*-^Eixl) g N-*hk. 
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Lemma 4. The following inequality holds 

C<1» + + • • • + C»> ^ / Cn V 

Bi = \N^‘/ ■ 

Proof. Since 

E(x,) + Eiy,) = 0, 

we have 

= l^(a:.)| = \Eiy,)\ ^ E\y,\. 

On the other hand, by virtue of Schwarz^s inequality 

lE\yi\ + E\y 2 \ + * * * + E\y„\]^ ^ 

n 

^ (^1 + ^2 + * * • + ^ 

Jfc = l 

whence the statement follows immediately. 

If the variable integer N should be subject to the requirements that 
both the ratios 


N2+6 


and 


^2 

Bn 


should tend to 0 when n increases indefinitely, then the preceding lemmas 
would give three important corollaries. But before stating these 
corollaries we must ascertain the possibility of selecting N as required. 
It suflSces to take 


Then 


1 

N == 


^ 

Bn 





by virtue of Liapounoff^s condition. 
Also 



will tend to 0. By selecting N in this manner we can state the following 
corollaries: 

Corollary 1. The sum 

0 ^ 1 + 52 + * * * + 

tends to 0 as n-^ oo. 
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Corollary 2. The ratio 

tends to 1. 

Corollary 3. The ratio 

C<1^> 4- + • • • + cy 

e 

B'l 

tends to 0 for all positive integer exponents c except e = 2. 

10. Let Fn(t) and </>n(0 represent, respectively, the probabilities of the 
inequalities 

2l ^2 + • • * + ^ , 

vm. 

-f ^2 + * ' * + . . 

By repeating the reasoning developed in Chap. X, Sec. 8, we find that 

\Fn{t) — </)n(0l ^ 5^1 + 5^2 + * ’ ' + 9'n. 

Hence, 

lim {Fn{t) — <l>n(t)) =0 as n —> c» 
by Corollary 1. It sufl^ices therefore to show 

1 

0n(O — 7 = I e~^^dx as n—^ oo, 

VTtJ- CO 

and that can be done by the method of moments. By the polynomial 
theorem 

/ Xi + X2 + ’ • ' + Xn V* ^ ^ ml Sa,fi, . . . X 

V 2?i} i 

where the summation extends over all systems of positive integers 
a ^ /3 ^ ^ X satisfying the condition 

a + /3+*-*+X = m 

and Sa,$, . . X denotes a symmetrical function of letters Xi, X 2 , . . . Xn 
determined by one of its terms 
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if I represents the number of integers a, jS, . . . X. Since variables 
Xij X 2 f * ^ . Xn are independent, we have 

+ X2 + ' ' ’ + Xn\"* _ ml Ga.fi, ■ . . X 

V • X! 

where Ga,fi. ... x is obtained by replacing powers of variables by mathe¬ 
matical expectations of these powers. It is almost evident that 

■ x| ^ + cr + ♦ - , Cf + cf + - • + cy 

m — o $ 

Bn^ BJ 

+ • • • + c"'* 

X 

BJ 


Now if not all the exponents a, /3, ... X are = 2 (which is possible 
only when m is even), by virtue of Corollary 3 the right member as well as 

Ga.B. • • X 

m 

Bj 

tends to 0. Hence 


E[ 


'xi + X2 + • • ■ + X 






if m is odd. 

But for even m we have 


(23) 




+ X2 + 


+ Xn y- _ 


V2B„ 

Let us consider now (m being even) 


ml (j 2 . 2 , 

2m 


0 . 


B„^ 


m m 

/ikV - + ^‘ 2 ^’ + • • • + c y ^ 2' ■ • ■ 

(bJ ( “ B„ ) ^XVI- ’O,! I 


where summation extends over all systems of positive integers 

X ^ ^ ^ « 


\ + n + ■■•+(» =-2 


satisfying the condition 
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and ... M is a symmetric function of . . . c'„®’ determined by 

its term 

I being the number of subscripts \ n, . . . o). Apparently 
Hx.m. ... w < + • • • + . . . 


Besides 

and 






^ N^, (cp)* s ^ 

(c< 2 )). + + • • • T (cl^O* ^ 


B' 


^ ©■ 


if 6 > 1 . Thus 




X.M. 


0 




if not all subscripts X, are equal to 1 . It follows that 




But by Corollary 2 


Bn 


1 


and evidently Hi^\, . . . i = 62 , 2 , ... 2 . Hence 


(I) 


t ^2,2; 


Bn2 


and this in connection with (23) shows that for an even m 


.( 


a^i + X 2 + 


+ X 




■)- 


ml 


2 -lf)! 


Finally, no matter whether the exponent m is odd or even, we have 

Ito +»- V -if 

V VSK ) Vij-. 
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Tshebysheff-Markoff’s fundamental theorem can be applied directly 
and leads to the result: 


lim <^)„(0 e-^'dx 

V^J- 00 

uniformly in t. On the other hand, as has been established before, 

lim [Fn(0 - <t>n{t)] = 0 

uniformly in t. Hence, finally 


lim Fnit) 



e~^^dx 


uniformly in t. 

And this is the fundamental limit theorem with Liapounoff^s condi¬ 
tions now proved by the method of moments. This proof, due to 
Markoff, is simple enough and of high elegance. However, preliminary 
considerations which underlie the proof of the fundamental theorem, 
though simple and elegant also, are rather long. Nevertheless, we must 
bear in mind that they are not only useful in connection with the theory 
of probability, but they have great importance in other fields of analysis. 
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ON A GAUSSIAN PROBLEM 

1. In a letter to Laplace dated January 30, 1812, ^ Gau«s mentions a 
difficult problem in probability for which he could not find a perfectly 
satisfactory solution. We quote from his letter: 

Je me rappelle pourtant d’lin probl^rne curieux ducpiel je me suis occupy il y 
a 12 ans, mais lequel je n’ai pas r^ussi alors k r^soiidrc ti ma satisfaction. Peut- 
6tre daignerez-vous en occiiper quelques moments: dans ce cas je suis sur que vous 
trouverez une solution plus complete. La voici: Soit M une quantity inconnue 
entre les limites 0 et 1 pour laquelle toutes les valeurs sont ou ^galement probables 
ou plus ou moins selon une loi donn^e: qu’on la suppose convertie en une fraction 
continue 


Quelle est la probability qu'en s’arretant dans le dyveloppement k un terme fini 
la fraction suivante 


1 

aCn+l) + 


1 

^(n+2) _|_ « 


soit entre les limites 0 et x? Je la designe par F(w, x) et j’ai en supposant toutes 
les valeurs ^galement probables 

F(0, x) = X. 


P(l, x) est une function transcendante dependant de la function 




• • • 4- 


1 


que Euler nomme inexplicable et sur laquelle je viens de donner plusieurs re- 
cherches dans un memoire presente ^ notre Society des Sciences qui sera bient6t 
imprime. Mais pour le cas ou n est plus grand, la valeur exacte de P(n, x) semble 
intraitable. Cependant j^ai trouve par des raisonnements trds simples que pour 
n infinie 


Pin, x) = 


log (1 -f x) 
log 2 


^Gkiuss^ Werke, X, 1, p. 371. 


396 
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Mais les efforts que j’ai fait lors de rnes recherches pour assignor 

log (1 + x) 


P(n, x) - 


log 2 


pour une valeur tr^s grande dc n, mais pas infinie, ont ^t6 infructueux. 

The problem itself and the main difficulty in its solution are clearly 
indicated in this passage. The problem is difficult indeed, and no 
satisfactory solution was offered before 1928, when Professor R. O. 
Kuzmin succeeded in solving it in a very remarkable and elegant way. 

2. Analytical Expression for Pn{x). We shall use the notation 
Pn{x) for the probability which Gauss designated by P(n, x). The first 
question that presents itself is how to express Pn{x) in a proper analytical 
form. Let . . . Vn, x) be an interval whose end points are 

represented by two continued fractions: 


V2 + 


and 


• + 


1 

Vn + X 


Vi 




with positive integer incomplete quotients Viy V 2 f . . . Vn, while x is a 
positive number ^1. Two such intervals corresponding to two different 
systems of integers fi, V 2 y , , . Vn and vj, . . . v'^^ do not overlap; 
that is, do not have common inner points. For, if they had a common 
inner point represented by an irrational number N (which we can always 
suppose), we should have for some positive x' < 1 and x" < 1 


N = ^ I 

+ + 


+ 


V'2 + 


+ 


Vn ” 1 “ X 


+ 


< + X" 


But that is impossible unless v[ = Vi, V 2 = Vi, . . . v'„ = Vn- 

A number M being selected at random between 0 and 1 and converted 
into a continued fraction 


M 




+ 


»n + i 


if the quantity f turns out to be contained between 0 and a: < 1, Af must 
belong to one (and only one) of the intervals 5(vi, 1 ) 2 , .. . t)„, x) cor¬ 
responding to one of all the possible systems of n positive integers 
Vt Vi, .. . Vn- Since M has a uniform distribution of probability and 
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since the length of the interval . . . Vn, x) is 

1 1 


(-1)” 


Vi + 


V2 “ 1 “ 


Vi + 


+ 


1 


V2 + 


Vn + X 

the required probability Pn{x) will be expressed by the sum 




p»(x) = 


. . . »n 


I 

Vi + ^ 

V2 + 


+ 


1 


J. 

V\+ I 

V2 + 


Vn + X 


+ : 


v„i 


extended over all systems of positive integers t^i, 1^2, . . . Vn. In general 
let 


^ = 1 1 

Qi fi + - , 
V 2 -r 


(i = 1, 2, . . . n) 




be a convergent to the continued fraction 




Then the above expression for Pn{x) can be exhibited in a more convenient 
form: 

(1) p.w- 2 

Vl,Vi, . . . Vn 

By the very definition of Pnix) we must have Pn(l) = 1; hence the 
important relation 


( 2 ) 


QniQn + Qn-l) 


= 1 , 


This result can also be established directly by resorting to the original 
expression of Pn(l) and performing summation first with respect to Vi, 
then with respect to V 2 , etc. 

Relation (2) can be interpreted as follows: Let 6 in general be the 
length of an interval S(vi, V 2 j . . . 1), Then 

sa = 1 

summation being extended over the (enumerable) set of intervals 5. 
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3. The Derivative of Pn{x). In attempting to show that Pn{x) 
tends uniformly to a limit function as n —> oo it is easier to begin with its 
derivative pn{x). Series 


^_1_ 

obtained by formal derivation of (1) is uniformly convergent in the 
interval (0, 1). For 


whence 


and the series 


' < 2 


iQn + xQn-l)^ Qn{Qn + Qn-l) 


^QniQn + Qn~l) 


= 2 


is convergent. Hence 
dPr^ix) 


dx 


- Pn{x) - 2 (Q„ + 


Since 


we have 


Qn — ^nQn—1 “I" Qn-~2 


V«{x) = 2 


. . . Vn' 


(Q-1 + 


2 (Vn + Xy 


and, performing summation with respect to Vi, . . . Vn-i for constant 

Vn 

2 Co I ^ e Y " 


*>■(*) ■ 

t»n-l 


+ xy 


whence 
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or else 

00 

»= 1 

—an important recurrence relation which permits determining com¬ 
pletely the sequence of functions 

Pi(x), v^(x), . . . 

starting with poix) — 1. 

4. Discussion of a More General Recurrence Relation. In discussing 
relation (3) the fact that po{x) = 1 is of no consequence. We may start 
with any function f^ix) subject to some natural limitations, and form a 
sequence 

fi{x)yf2(x)yf3{x), . . . 

by means of the recurrence relation 

00 

(4) /.W - 

t » 1 

The following properties of /m(x) follow easily from this relation; 
a. If 


= TT-x 

then 

fn(.x) = \ w == 1, 2, 3, . . . 

For 


/.w=- V +j+ 1 ) - 


whence the general statement follows immediately. 
6. If 


a 

r+^ 


then 


r^^/o(x) 


M 

1 +x 


m 

r +1 


^ /-(x) 




M 

l+x' 
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Follows from (a) and equation (4) itself. 

As a corollary we have: Let Mn and rrin be the precise upper and 
lower bounds of 

(1 + x)fn{x) (n = 0, 1, 2, . . . ) 

in the interval 0 ^ x ^ 1. Then 

Mo Ml Mo ^ • • • 
mo ^ mi ^ m2 ^ ... 

c. We have 


- 2X^-'C + +*)’ ■ 

■ X - X 

d. The following relations can easily be established by mathematical 
induction: 

+ xPn-l\ 1 

JnW + xQn-l)^ 

f ( . _ /P« +xFn-A 1 

hnW - + xQ„_./(0„ + xQn-lY 

f ( . _ {Pn + xP„^l\ 1 

-+ xQ„_iAQ« + 


Let us suppose now that the function fo(x) defined in the interval 

0 ^ a; ^ 1 

possesses a derivative everywhere in this interval and let /io be an upper 
bound of |/i(x)| while M is an upper bound of |(1 + x)/o(x)|. Then by 
property (6) 

|/„(x)| ^ M] \Mx)\ ^ M; |/,„(x)| ^M, . . . . 

The function f„{x) represented by the series 

/»(x) = ^ 

where u stands for 

Pn + xPn-l 

Qn + ^Qn-l 
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has a derivative; for the series obtained by a formal differentiation 

/nW = + xQn-l)* 

IS uniformly convergent and represents/'(x). Now 


Qn 


and 


Hence 


(Qn + iQn-l)’ Ql 
Q2 ^ Qn(Qn + Qn-l) 




Qn—1 


(Qn + xQn-iyl 


< 4M 


1 


^Qn(Qn + Qn-l) 


4M 


by virtue of (2). On the other hand, the inequality 

Qn(Qn + Qn-l) = (VnQn-1 + Qn-2)[(2^n + l)Qn-l + Qn--2] > 

> 2Q._i(Qn-l + Qn-2) 

holding for n ^ 2 together with an evident inequality 

Qi(Qi + Qo) ^ 2 

shows that 

Qn(Qn + Qn—l) >2” ^ 2). 

Thus 

(Qn + :cQ„-i)^ > QJ • Q2 > + Q"->) > 


> 2-“Q„(Q„ + Qn_i) 


and consequently 




(- 1 )” 


(Qn + i*^Qn-l)^| 


Ak) 


Hence, we may conclude that 


is an upper bound of l/n(x)|. Similarly, starting with the second equation 
in (d), we find that 
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is an upper bound of |/J,(x)l, and so forth. In general, the recurrence 
relation 


M* = ^2 + (A: = 1, 2, 3, ■ • • ) 


determines upper bounds of 

l/'WI, IfLMl, \fL{x)\, 

It is easy to see that in general 


MA: \ 2 *("— 2) ' 1 _ 2) 


SO that for sufficiently large n 


Hk < ^M. 


5. Main Inequalities. Let 


V3o(x) =/o(x) - 


mo 

I + X 


Then 


= ^n(x) = 

Since the intervals 5 defined at the end of Sec. 2 do not overlap and cover 
completely the whole interval (0, 1), we may write: 

I = <Po{x)dx = 22 = 22'^'’(’"*^Q„(Q„ + Q„_0’ 

the latter part following from the mean value theorem and Ui being a 
number contained within the interval d. By subtraction we find 


fn(x) 


mo 


— I > [^(w) — 1 Po(Mi)]7 


1 +x ' ' + Q»-i) 

and, since both u and u\ belong to the same interval 6, 

/\^ Mo + wio ^ A^o + wio 

^„(«) ^(m,) > > 2» 


Snix) - 


mo 

1 +* 


- I > 


Ho + mo 
2"+i ' 


Consequently, 
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and a fortiori 

r V. ^ ^ - 2-"(mo + Wo) 

JnW > ^ ^ . 

It follows that 

(5) mi ^ mo + I — 2~^(/xo + Wo).* 

In a similar way, considering the function 

Mx) = ~ 

and setting 

h = hj^4^a{x)dx, 

we shall have 


/ ('r\ ^ -^0 ~ ^1 + 2 "(/ip + Mq) 

Jn{X) < ^ 

whence 

(6) Ml ^ Mo -li + 2-"(mo + Mo). 

Further, from (5) and (6) 

Ml — mi ^ Mo — Wo + 2“”*^^(/uo + Mo) — I — h. 

But 

I + h = ^ log 2 • (Mo — Wo) = (1 — fc)(Mo — Wo); k < 0.66, 
so that finally 

Ml — Wi < A;(Mo — Wo) + 2“”+^(^o + Mo). 

Starting with /n(x), /2n(x), . . . instead of /o(x), in a similar way we find 

M2 — m2 ""C k{Mi — Wi) -f- 2 ~”^^(/xi -j- Ml) 

Mz — mz < h{M 2 — W 2 ) + 2 ”'‘+^(pt 2 + M 2 ) 

Mn mn < k{Mn^l — Wn-l) + 2~”'^^(/Xn_l + Mn-l). 

From these inequalities it follows that 

Mn - rrin < {Mq - mo)k^ + 2“”+^ [fiok^~^ + fuk^'^^ + • * * + Mn-i + 

Without losing anything in generality, we may suppose that fo{x) is a 
positive function. Then 

*Mit nii are used here with the same meaning as Mni, rrini in Sec. 4. 
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Mk g Mo, Mfc < 5Mo (A: = 1, 2, 3, ... ) 

at least for sufficiently large n. Owing to these inequalities we shall have 


(7) 


Mn - mn < (Mo - + /io{ 2 


+ 


6Mo 


(1 - 


This inequality shows that sequences 

Mo ^ Ml ^ ^ • • 

mo ^ mi ^ nii ^ • • • 


approach a common limit a. The following method can be used to find 
the value of this limit. Let N be an arbitrary sufficiently large integer 
and n the integer defined by 


Then 


^ iV* < (n + 1)^- 


and therefore 


mn 

1 + X 


^ /nn(a;) 


^ Af„ 
= 1+1 


1 + X 


^ Mx) 


< M„ 
= 1 + x 


The last inequality permits presenting/Ar(x) thus; 


(8) Mx) = + 9iM„ - m„); |ei < 1, 

whence 

J^fN{x)dx = j^fo{x)dx = a log 2 + e'{Mn - Wn), l^'l < 1, 

and, because M„ — m„ ultimately becomes as small as we please in 
absolute value, 

a log 2 = £^Mx)dx. 

Equation (8) shows clearly that the sequence of functions 

/o(x),/i(x),/ 2 (x), . . . 

defined by the recurrence relation (4) approaches uniformly the limit 
function 


a 


l+x 
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where 


6. Solution of the Gaussian Problem. It suffices to apply the preced¬ 
ing considerations to the case /o(a:) = Pq{x) =1. In this case Afo = 2, 
mo = 1, Mo = 0 and 

“ log 2 

Consequently, 

'"W ■ ( 1 + ') log 2 + *(‘' + '*1 ^' 

where n = [v^]- It suffices to integrate this expression between limits 
0 and < < 1 to find 


PAD = -log + ir-%2”-^ ’ 


As iV —>• 00 


PAD 


log (1 + D 
log 2 


as stated by Gauss. Moreover, 

log (I + <)i < ^ 


PAD 


log 2 

for sufficiently large, but finite N. 


(1 - jfc)2’-V 



Table of the Probability Integral 


\/^Jo 


2 


z 

0 (*) 

z 

0 (*) 

z 


0.00 

0.0000 

0.66 

0 2422 

1.30 

0.4032 

1.96 

0.4744 

0.01 

0.0040 

0.66 

0.2464 

1.31 

0.4049 

1.96 

0.4750 

0.02 

0.0080 

0.67 

0.2486 

1.32 

0.4066 

1.97 

0.4756 

0.03 

0.0120 

0.08 

0 2517 

1.33 

0.4082 

1.98 

0.4761 

0.04 

0 0160 

0.69 

0.2649 

1.34 

0.4099 

1 .99 

0.4767 

0.05 

0 0199 

0.70 

0.2580 

1.36 

0.4115 

2.00 

0.4772 

0.06 

0.0239 

0.71 

0.2611 

1.36 

0 4131 

2.02 

0.4783 

0.07 

0.0279 

0 72 

0.2642 

1 37 

0.4147 

2.04 

0.4793 

0.08 

0 0.319 

0.73 

0.2673 

1.38 

0.4102 

2.06 

0.4803 

0.09 

0 0369 

0 74 

0.2703 

1 39 

0.4177 

2.08 

0.4812 

0.10 

0 0398 

0.75 

0 2734 

1.40 

0.4192 

2.10 

0.4821 

0.11 

0 0438 

0.76 

0 2764 

1.41 

0 4207 

2.12 

0.4830 

0.12 

0 0478 

0.77 

0 2794 

1 42 

0.4222 

2.14 

0.4838 

0 13 

0.0517 

0 78 

0 2823 

1 .43 

0.4236 

2,16 

0.4846 

0.14 

0 0657 

0.79 

0.2862 

1 44 

0 4251 

2.18 

0.4864 

0.16 

0.0596 

0 80 

0 2881 

1.45 

0 4265 

2 20 

0.4861 

0.16 

0 0636 

0 81 

0 2910 

1.46 

0.4279 

2.22 

0.4868 

0.17 

0 0675 

0 82 

0 2939 

1.47 

0.4292 

2 24 

0.4876 

0.18 

0.0714 

0.83 

0 2967 

1.48 

0.4306 

2 26 

0.4881 

0.19 

0.0763 

0 84 

0 2995 

1.49 

0 4319 

2 28 

0.4887 

0.20 

0 0793 

0.85 

0 3023 

1.50 

0 4332 

2.30 

0.4893 

0.21 

0 0832 

0.86 

0.3061 

1.51 

0 4345 

2.32 

0.4898 

0.22 

0 0871 

0 87 

0 3078 

1.52 

0 4357 

2.34 

0.4904 

0.23 

0 0910 

0.88 

0 3106 

1.53 

0.4370 

2 36 

0.4909 

0 24 

0.0948 

0 89 

0.3133 

1.54 

0.4382 

2 38 

0.4913 

0 26 

0 0987 

0 90 

0 3159 

1.55 

0.4394 

2 40 

0.4918 

0.26 

0.1026 

0.91 

0 3186 

1.66 

0.4406 

2.42 

0.4922 

0.27 

0.1064 

0.92 

0 3212 

1.57 

0.4418 

2 44 

0.4927 

0.28 

0.1103 

0 93 

0 3238 

1.58 

0.4429 

2.46 

0.4931 

0.29 

0.1141 

0.94 

0 3264 

1 69 

0.4441 

2.48 

0.4934 

0.30 

0.1179 

0 95 

0 3289 

1.60 

0.4452 

2.50 

0.4938 

0.31 

0.1217 

0.96 

0.3316 

1 61 

0 4463 

2 52 

0.4941 

0.32 

0.1265 

0.97 

0.3340 

1 62 

0.4474 

2.64 

0.4945 

0 33 

0.1293 

0,98 

0.3366 

1.63 

0.4484 

2.66 

0.4948 

0.34 

0.1331 

0 99 

0 3389 

1 .64 

C 4495 

2 58 

0.4961 

0.36 

0.1368 

1.00 

0.3413 

1.66 

0.4505 

2.00 

0.4963 

0.36 

0 1406 

1 01 

0.3438 

1 66 

0 4515 

2.62 

0.4956 

0.37 

0 1443 

1.02 

0.3461 

1.67 i 

0.4525 

2 64 

1 0.4969 

0 38 

0 1480 

1 03 

0 3486 

1 68 

0 4535 

2.66 

0.4961 

0.39 

0 1617 

1.04 

0.3508 

1.69 

0.4545 

2 68 

0.4963 

0.40 

0 1564 

1.06 

0.3531 

1.70 

0 4664 

2.70 

0.4966 

0.41 

0.1691 

1.06 

0 3554 

1.71 

0.4564 

2.72 

0.4967 

0.42 

0 1628 

1.07 

0 3577 

1.72 

0 4573 

2.74 

0 4969 

0.43 

0.1664 

1.08 

0.3599 

1.73 

0 4582 

2.76 

0.4971 

0.44 

0 1700 

1 .09 

0.3621 

1.74 

0.4591 

2 78 

0.4973 

0.46 

0 1736 

1.10 

0.3643 

1.75 

0.4599 

2.80 

0.4974 

0.46 

0.1772 

1.11 

0.3666 

1.76 

0.4608 

2.82 

0.4976 

0.47 

0 1808 

1 .12 

0.3086 

1.77 

0 4616 

2.84 

0.4977 

0.48 

0 1844 

1.13 

0.3708 

1.78 ' 

0.4625 

2.86 

0.4979 

0.49 

0 1879 

1.14 

0.3729 

1 79 

0 4633 

2.88 

0.4980 

0.60 

0.1916 

1.16 

0 3749 

1.80 

0.4641 

2.90 

0.4981 

0.61 

0.1960 

1.16 

0.3770 

1.81 ! 

0.4649 

2.92 

0.4982 

0.62 

0.1986 

1.17 

0.3790 

1.82 i 

0.4656 

2.94 

0.4984 

0.63 

0.2019 

1.18 

0.3810 

1.83 

0.4664 

2 96 

0 4986 

0.64 

0.2064 

1.19 

0.3830 

1.84 

0.4671 

2 98 

0.4986 

0.66 

0.2088 

1.20 

0.3849 

1.85 

0.4678 

3 00 

0.49865 

0.66 

0.2123 

1 21 

0.3869 

1.86 

0.4686 

3 20 

0.49931 

0.67 

0.2157 

1.22 

0 3888 

1.87 

0 4693 

3.40 

0.49966 

0.68 

0.2190 

1.23 

0.3907 

1.88 

0.4699 

3.60 

0.499841 

0.69 

0.2224 

1.24 

0 3926 

1.89 

0.4706 

3.80 

0 499928 

0.60 

0.2257 

1.26 

0 3944 

1.90 

0.4713 

4.00 

0.499968 

0.61 

0.2291 

1.26 

0.3902 

1.91 

0 4719 

4.50 

0.499997 

0.62 

0.2324 

1.27 

0 3980 

1.92 

0.4726 

5.00 

0.499997 

0.63 

0.2367 

1.28 

0.3997 

1.93 

0.4732 



0.64 

0.2389 

1.29 

0.4015 

1.94 

0.4738 
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INDEX 


A 

Arrangements, 18 

B 

Bayes’ formula (theorem), 61 
Bernoulli criterion, 5 
Bernoulli theorem, 96 
Bernoulli trials, 45 
Bernstein, S., inequality, 205 
Bertrand’s paradox, 251 
Buffon’s needle problem, 113, 251 
Barbier’s solution of, 253 

C 

Cantelli’s theorem, 101 
Cauchy’s distribution, 243, 275 
Characteristic function, composition of, 
275 

of distribution, 240, 264 
Coefficient, correlation, 339 
divergence, 212, 214, 216 
Combinations, 18 

Compound probability, theorem of, 31 
Continued fractions, 358, 361, 396 
Markoff’s method of, 52 
Continuous variables, 235 
Correlation, normal {see Normal cor¬ 
relation) 

Correlation coefficient, distribution of, 
339 

D 

Difference equations, ordinary, 75, 78 
partial, 84 

Dispersion, definition, 172 
of sums, 173 

Distribution, Cauchy’s, 243, 275 
characteristic function of, 264 
of correlation coefficient, 339 


Distribution, determination of, 271 
equivalent point, 369 
general concept of, 263 
normal (Gaussian), 243 
Poisson’s, 279 
^‘Student’s,” 339 

Distribution function of probability, 
239, 263 

Divergence coefficient, empirical, 212 
lexis’ case, 214 
Poisson’s case, 214 
theoretical, 212 
Tschuprow’s theorem, 216 

E 

Elementary errors, hypothesis of, 296 
Ellipses of equal probability, 311, 328 
Estimation of error term, 295 
Euler’s summation formula, 177, 201, 
303, 347 

Events, compound, 29 
contingent, 3 
dependent, 33 
equally likely, 4, 5, 7 
exhaustive, 6 
future, 65 
incompatible, 37 
independent, 32, 33 
mutually exclusive, 6, 27 
opposite, 29 

Expectation, mathematical, 161 
of a product, 171 
of a sum, 165 

F 

Factorials, 349 
Fourier theorem, 241 
French lottery, 19, 108 
Frequency, 96 

Fundamental lemma {see Limit theorem) 
Fundamental theorem {see Tshebysheff- 
Markoff theorem) 
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G 

Gaussian distribution, 243 
Gaussian problem, 396 
Generating function of probabilities, 
47, 78, 85, 89, 93, 94 

H 

Hermite polynomials, 72 
Hypothesis of elementary errors, 296 

I 

Independence, definition of, 32, 33 
K 

Khintchine {see Law of large numbers) 
Kolmogoroff {see Law of large numbers; 
Strong law of large numbers) 

L 

Lagrange’s series, 84, 150 
Laplace-Liapounoff {see Limit theorem) 
Laplace’s problem, 255 
Laurent’s series, 87, 148 
Law of large numbers, generalization 
by Markoff, 191 

for identical variables (Khintchine), 
195 

Kolmogoroff’s lemma, 201 
theorem, 185 
Tshebysheff’s lemma, 182 
Law of repeated logarithm, 204 
Law of succession, 69 
Lexis’ case, 214 

Liapounoff condition {see Limit theorem) 
Liapounoff inequality, 265 
Limit theorem, Bernoullian case, 131 
for sums of independent vectors, 318, 
323, 325, 326 
fundamental lemma, 284 
Laplace-Liapounoff, 284 
Line of regression, 314 
liOttery, French {see French lottery) 

M 

Marbe’s problem, 231 
Markoff’s theorem, infinite dispersion, 
191 


Markoff’s theorem, for simple chains, 301 
Markoff-Tshebysheff theorem {see 
Tshebysheff-Markoff theorem) 
Mathematical expectation, definition of, 
161 

of a product, 171 
of a sum, 165 

Mathematical probability, definition of, 
6 

Moments, absolute, 240, 264 
inequalities for, 264 
method of (Markoff’s), 356^. 

N 

Normal correlation, 313 
origin of, 327 

Normal distribution, Gaussian, 243 
two-dimensional, 308 

P 

Pearson’s “x’^-test, ” 327 
Permutations, 18 
Point, of continuity, 261, 356 
of increase, 262, 356 
Poisson series, 182, 293 
Poisson’s case, 214 
Poisson’s distribution, 279 
Poisson’s formula, 137 
Poisson’s theorem, 208, 294 
Polynomials, Hermite {see Hermite) 
Probability, approximate evaluation of, 
by Markoff’s method, 52 
compound, 29, 31 
conditional, 33 
definition (classical) of, 6 
total, 27, 28 

Probability integral, 128 
table of, 407 

R 

Relative frequency, 96 
Runs, problem of, 77 

S 

Simple chains, 74, 223, 297 
Markoff’s theorem for, 301 
Standard deviation, 173 
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Stieltjes’ integrals, 261 
Stirling’s formula, 349 
Stochastic variables, 161 
Scrong law of large numbers (Kolmo- 
^goroff), 202 

Student’s” distribution, 339 
T 

Table of probability integral, 407 
Tests of significance, 331 
Total probability, theorem of, 27, 28 
Trials, dependent, independent, repeated, 
44, 45 


Tschuprow {see Divergence coefficient) 
Tshebysheff-Markoff theorem, funda¬ 
mental, 304, 384 
application, 388 
Tshebysheff’s inequalities, 373 
Tbhebysheff’s inequality, 204 
Tshebysheff’s lemma, 182 
Tshebysheff’s problem, 199 

V 

Variables, continuous, 235 
independent, 171 
stochastic, 161 
V^ector^ {see Limit theorem) 





